Closed mzpqnxow closed 3 years ago
I have added PR #311 for this as a proposal
I have created a new PR for this, #332. Rather than rebase the original one (which already had a bunch of clunky commits in it) I found it easier to do it this way. Please see #332 instead of #311
Thanks!
change merged, will be part of release 1.3.1
change merged, will be part of release 1.3.1
Thanks!
Problem
When exporting/downloading very large scans (implemented in the
export()
method @tenable.io.ScansAPI
here) it can be frustrating waiting for the download to complete without any simple and customizable way to gauge and report the progress of the download, either to the console or to some other resource over the network or on the local system. The download happens in a very tight loop so there is no simple way to track the progress, let alone perform custom actions as it streams downMy application regularly uses pyTenable to download scans that exceed 1GB, which can take upwards of 10 minutes to complete. This leaves a user at the console without any idea of an ETA and it also prevents any applications supervising it from tracking its progress or capturing granular metrics. It's effectively a 10+ minute period of idle time, as far as the process monitor is concerned. For a user, it's either a blank screen or very ugly (in my opinion) logging module output, depending on the log level set for various loggers
Possible Solutions
To provide a real-time status update to the user at the console (or to a remote logging solution) the options that I am aware of are as follows:
logging
system for its own loggers (tenable.io.scans
, for example) and indirectly for logging third-party activity- in this case,urllib3
, which is used byrequests
. I concede that it is simple to tune the log level for these loggers and make do with the information they provide, but that information is often very poorly formatted, too verbose, or not verbose enough, even when adjusting the log level on the loggers. It is possibly to add a custom log formatter, adapter, etc. as well, but this is clunky and limited as the Python logging module is not well-suited for stateful progress reportingScansAPI.export()
- make a small change toexport()
to accept an optional callable as a keyword argument, to permit the caller to request that the two lineresponse.iter_content()
loop be delegated to their own custom function. This works well in this case in my opinion because it is very lightweight and there is very little logic in the transfer loop so it can be done very simplyIdeal Solution
I came to the conclusion that the third option is the simplest and most flexible. The callable provided by the user will need to iterate over the stream using
requests.Response.iter_content()
, but this is very simple and compact logic. The user only needs to write the chunk to thefobj
stream and is then free to add any small action after it, to execute on each iteration (or everyn
iterations, if they prefer). It also provides them the ability to customize the chunk size, though I'm not sure that's a serious limitation as is.This commit is only a few lines and takes this approach. It adds an optional kwarg (
stream_hook: Optional[Callable] = None
) to theexport()
method intenable.ioScansAPI
. If set, the stream iteration loop is delegated to the user specified function. Because the changes are minimal and optional, they are very unlikely to impact stability or performance in any non-negligible way for current users as it only introduces one additional comparisonCaller Implementation
This is a very minimalist example of a
stream_hook
function that a user could implement in their application and pass toexport()
. It just emits a simpletqdm
progress meter that shows an ETA and a transfer speed:The developer would only need to add
stream_hook
to their call toTenableIO.scans.export()
:Alternate Uses
Besides using this as a basic progress meter for scan/report downloads for a user at the console, the general idea of providing simple, non-invasive hooks like this in key places has some other use-cases that I can think of, but I'm only really interested in this particular place. I'm not suggesting that every use of
requests.Response.iter_content()
be replaced with this sort of hook, though I would be happy to apply this sort of change anywhere else it may be determined useful to other usersAs an example of a more substantive use, this provides the ability for the application to provide real-time status information to a separate application or service that may be acting as a supervisor or a metrics collector
Conclusion
If a PR for this is something that the maintainers would be open to, I would like to receive some feedback/suggestions on the details of the what the signature of the callable should be, assuming it is not acceptable as-is. It is probably best to send the
chunk_size
parameter to the callable, so the developer doesn't need to specify an arbitrary value for it themselves. The default value (1024) could be passed through to it to ensure consistent behavior:If it was desired to leave room for providing the callable with additional context or other information in the future, the recommended signature could be something like this:
An completely different implementation would be something "heavier" - a true adapter-style class, with a newly written base class provided by pyTenable. I don't see any value in the additional complexity required for that approach, so I'm not suggesting it
If there is absolutely no interest in accepting an enhancement like this, I have no problem maintaining this on my own fork/branch, but it would be very nice for it to be considered as an upstream PR, especially considering the low likelihood of it negatively impacting current users of pyTenable
The branch for the above referenced change can be found here. I do not intend to submit it as a PR unless/until there is some interest from the maintainers in accepting it
Thank you, I appreciate any feedback on this and also appreciate the time and effort you put into maintaining this library. It's one of the most robust and well documented libraries when compared with those provided by other security vendors' products and services. I won't mention any names here, but I've found many vendors provide nothing more than a very basic test-suite and some stale or out of date documentation