requests / toolbelt

A toolbelt of useful classes and functions to be used with python-requests
https://toolbelt.readthedocs.org
Other
999 stars 186 forks source link

Dumping streamed responses breaks later processing #299

Open neg3ntropy opened 4 years ago

neg3ntropy commented 4 years ago

I am sometimes using stream=True and then consuming the raw response chunk by chunk. I if dump one of those responses using toolbelt's dump(), it consumes all the content and later processing fails because the download stream is now gone and closed.

As for outgoing request data, when a file-like payload is used the body is not dumped, for response bodies this should be achievable too, dumping only the headers.

It appears to me that there is no way to tell that a response is being streamed from the response object itself, so an optional parameters would be needed on dump (I might be wrong).

sigmavirus24 commented 4 years ago

For dumping a response:

We have no way to rewind the streamable content so the only way we could responsibly handle that would be to tee it off to an in-memory ByteIO buffer and then replace the underlying file-like object with that so you can handle it as a chunkable response. That said, that will break other assumptions your code has about the response (not taking up all available RAM for example).

For dumping a request:

We could .seek(0, 0) but that would be wrong for many users. Why? Some people have a buffer and chunk uploads by size, and seeking to the very beginning would be wrong because that request started somewhere else. It's not possible for us to handle this intelligently.

dump could be made smarter, sure, but it's a matter of diminishing returns.

neg3ntropy commented 4 years ago

In case I was not too clear, I am happy with what's happening for requests and I think that the only reasonable course of action is to not dump the body for stream responses.

sigmavirus24 commented 4 years ago

:+1: I'd be happy to review a PR doing this

neg3ntropy commented 4 years ago

I actually found a way to detect that if the response is being streamed that seems to work in my case. Please check #300. If the approach is ok, I will add a test.