psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.17k stars 9.33k forks source link

requests gets stuck on post redirect where data is readable #3079

Closed tzickel closed 7 years ago

tzickel commented 8 years ago

Checked on python 2.7. Unfortunately httpbin does not support 307 post redirects and thus it is not easy to test this issue in requests test suite currently.

Anyhow the issue seems to be with the stream being already over when the redirects tries to read it again. The response should be an exception if a stream is inputed and it's redirected (although this is known only in httplib).

example code:

from flask import Flask, request, redirect, url_for
import requests
import io

app = Flask('test')

@app.route('/redirect/', methods=['POST'])
def redirect():
  response = app.make_response('')
  response.status_code = 307
  response.headers['Location'] = 'http://localhost:5000/'
  return response

@app.route('/', methods=['POST'])
def index():
  return request.get_data()

if __name__ == "__main__":
  import threading
  def loop():
    app.run()
  t = threading.Thread(target=loop)
  t.daemon = True
  t.start()
  if requests.post('http://localhost:5000/redirect/', data='hey').text != 'hey':
    raise Exception()
  if requests.post('http://localhost:5000/redirect/', data=io.BytesIO('hey')).text != 'hey': # <-- This code gets stuck
    raise Exception()
Lukasa commented 8 years ago

So this is a very real problem: when passing requests any file-like object it does not rewind the file pointer on redirect. That means that if the original POST consumed the data, the follow-on POST will not.

With normal files, that causes a small behavioural error, because we'll usually conclude that the file is zero-length and POST nothing. That's wrong, but program execution continues. With BytesIO we have a separate problem, which is that we use len(BytesIO.getvalue()) to work out how much data we're going to send. getvalue() doesn't change with read(), which means that we tell the remote server that we're going to send 3 bytes of data but provide an empty body. This obviously causes the server to wait for the body that is never going to come.

Fixing this requires a degree of care: the PreparedRequest needs to make a note, for any object with a tell() method and a read() method, of what the current location of the file pointer is. It then needs to check, in resolve_redirects, whether it's keeping the request body, and if it is and has a saved location of the file pointer then it needs to restore that pointer.

This should be entirely do-able though. Are you interested in submitting a patch to fix this?

tzickel commented 8 years ago

I don't like that BytesIO.getvalue() is called at super_len, because if i'm passing large data (let's say 100MB of data), the getvalue will allocate a new copy of it in memory, than super_len will get the length, than deallocate it...

Unfortunately on python 2 I'm not sure if there is a better way for this (in python 3, I think you can get the object's buffer, and check it's length, to skip this allocation).

Maybe we should use chunked transfer for objects like that ?


Also If I understand correctly, if you pass files= instead of data=, all files will be read into memory in the message preparation phase, and they will not be streamed from disk like passing a single file in data, yes ?

Lukasa commented 8 years ago

@tzickel As far as I know, BytesIO implements tell() on all Python versions, so the simplest thing to do is to add logic that does tell(), seek(), tell(), seek(). Doing that, incidentally, would allow us to also record the current location in the file-like-object so that we can safely rewind.

@sigmavirus24, you wrote the original getvalue() call: what are the odds you remember the rationale at the time?

tzickel commented 8 years ago

Nice, something like prepare_content_length is doing. Maybe there is another bug (just by reading code, haven't tried to trigger it yet), that if you pass data that has 'read' / 'seek' / 'tell' but no 'iter', it will go in prepare_body to the non-stream code path, and call prepare_content_length, which will than move the file pointer to the start of the file instead of to it's position when passed into requests (I assume this will be need to be a custom object, since most stuff with read/seek/tell have iter as well).

I've also added another question in my previous comment, regarding passing real files to files= param, and it will copy them into memory instead of streaming them from disk ?

Lukasa commented 8 years ago

I suspect that there are some latent bugs in this general area, sadly. For each one, if you can provide a test case that repros the problem, we can look at fixing it. =)

Also If I understand correctly, if you pass files= instead of data=, all files will be read into memory in the message preparation phase, and they will not be streamed from disk like passing a single file in data, yes ?

Correct: files sends a multipart file upload. If you need to stream it, the requests toolbelt has an object for that.

tzickel commented 8 years ago
  1. I think that httpbin's /redirect and /redirect-to should both support all headers, not just get and should have an added optional parameter for the HTTP code (which 30X) since the current 302 is too limiting (requests changes them all to GET).
  2. While I don't use the tool belt, it too should be checked for this issue (when it streams).
Lukasa commented 8 years ago

For limitations with httpbin I'm afraid you'll have to take that up on their repository.

tzickel commented 8 years ago

Here is a demo for the other bug (without redirect this time), is it too nit-picking ? should it have another issue report in your opinion ? https://gist.github.com/tzickel/acac6bf2faaaadac1245d8a2d8683516

Lukasa commented 8 years ago

I think keeping it all in the same issue is a good idea. =) Once we get httpbin working for this, we can write some tests and some fixes.

tzickel commented 8 years ago

This other bug can actually work with httpbin, since it's using a regular post method.

Lukasa commented 8 years ago

In that case, feel free to open a pull request with a test case that reproduces the bug and we can go from there. =)

(Make sure it fails, rather than just timing out, by setting a timeout parameter on the request.)

tzickel commented 8 years ago
  1. Shouldn't requests have an option to make 302 and 307 act the same (instead of as 303 as it does by default) ? The HTTP 1.1 spec actually meant that 302 should be like 307 by default (yet most browsers treat it like a 303).
  2. I'm still waiting for the input from @sigmavirus24 because that is important memory-wise.
Lukasa commented 8 years ago

Shouldn't requests have an option to make 302 and 307 act the same (instead of as 303 as it does by default) ?

I don't believe so, no. In general it's substantially more consistent to behave like a browser in this situation, given the fact that POST -> 302 -> GET is a extremely common pattern. The advantage of having a switch isn't really there. If needed, you can turn redirects off and handle them yourself to get this behaviour.

tzickel commented 8 years ago

Ok. https://github.com/kennethreitz/requests/pull/3082 is for the second issue. The first / main / bigger issue cannot be auto-tested with httpbin for now (I think).

sigmavirus24 commented 8 years ago

I don't remember the reasoning for the getvalue call. I'm tempted to say that existed before I extracted it out into a function and I was using it as a basis for the new function to handle all things. It's also positive I just was very naive in how I implemented support for String/Bytes IO objects. I don't honestly remember.

nateprewitt commented 8 years ago

So I believe that #3082, #3535, #3655, and Runscope/httpbin#284 in combination should have addressed all the pieces brought up in this issue. @tzickel, is there anything else you feel is missing? If not, I'm proposing we close this as resolved.

nateprewitt commented 7 years ago

Alright, with 2.12 released and no further comments from @tzickel, I think we can consider this closed.

gkalidas commented 4 years ago

Hi guys,

I'm having the same issue, request.post gets stuck with the following error. Sharing the Traceback below.

NOTE: This works fine on localhost, It's not working on an instance.

DEBUG:river.apps: RiverApp is loaded. (2020-02-19 15:52:08; apps.py:27) DEBUG:urllib3.connectionpool: Starting new HTTP connection (1): instance_ip_address:8000 (2020-02-19 15:52:16; connectionpool.py:225) ERROR:django.request: Internal Server Error: /api/v1/user/login/ (2020-02-19 15:53:16; exception.py:124) Traceback (most recent call last): File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/django/core/handlers/exception.py", line 39, in inner response = get_response(request) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/django/core/handlers/base.py", line 249, in _legacy_get_response response = self._get_response(request) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, in _get_response response = self.process_exception_by_middleware(e, request) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, in _get_response response = wrapped_callback(request, *callback_args, callback_kwargs) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view return view_func(*args, *kwargs) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/django/views/generic/base.py", line 68, in view return self.dispatch(request, args, kwargs) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/rest_framework/views.py", line 483, in dispatch response = self.handle_exception(exc) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/rest_framework/views.py", line 443, in handle_exception self.raise_uncaught_exception(exc) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/rest_framework/views.py", line 480, in dispatch response = handler(request, *args, *kwargs) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/rest_framework/decorators.py", line 52, in handler return func(args, kwargs) File "./gkAuth/views.py", line 97, in authenticate_user resp = requests.post(url=url, data="username=%s&password=%s" % (email, raw_password), headers={"Content-Type": "application/x-www- form-urlencoded"}) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/requests/api.py", line 116, in post return request('post', url, data=data, json=json, kwargs) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/requests/api.py", line 60, in request return session.request(method=method, url=url, kwargs) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/requests/sessions.py", line 533, in request resp = self.send(prep, send_kwargs) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/requests/sessions.py", line 646, in send r = adapter.send(request, **kwargs) File "/home/Projects/gk/env_cn/lib/python2.7/site-packages/requests/adapters.py", line 498, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', BadStatusLine('No status line received - the server has closed the connection',))