python / cpython

The Python programming language
https://www.python.org
Other
63.13k stars 30.22k forks source link

http.server parse_request() bug and error reporting #72734

Closed warsaw closed 7 years ago

warsaw commented 7 years ago
BPO 28548
Nosy @warsaw, @vadmium
PRs
  • python/cpython#552
  • Files
  • parse-version.patch
  • parse-version.v2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['3.7', 'library'] title = 'http.server parse_request() bug and error reporting' updated_at = user = 'https://github.com/warsaw' ``` bugs.python.org fields: ```python activity = actor = 'dstufft' assignee = 'none' closed = True closed_date = closer = 'martin.panter' components = ['Library (Lib)'] creation = creator = 'barry' dependencies = [] files = ['45259', '45300'] hgrepos = [] issue_num = 28548 keywords = ['patch'] message_count = 9.0 messages = ['279611', '279634', '279637', '279639', '279640', '279646', '279673', '281190', '281194'] nosy_count = 3.0 nosy_names = ['barry', 'python-dev', 'martin.panter'] pr_nums = ['552'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue28548' versions = ['Python 3.7'] ```

    warsaw commented 7 years ago

    This might also affect other Python version; I haven't checked, but I know it affects Python 3.5.

    In Mailman 3, we have a subclass of WSGIRequestHandler for our REST API, and we got a bug report about an error condition returning a 400 in the body of the response, but still returning an implicit 200 header.

    https://gitlab.com/mailman/mailman/issues/288

    This is pretty easily reproducible with the following recipe.

    $ git clone https://gitlab.com/mailman/mailman.git
    $ cd mailman
    $ tox -e py35 --notest -r
    $ .tox/py35/bin/python3.5 /home/barry/projects/mailman/trunk/.tox/py35/bin/runner --runner=rest:0:1 -C /home/barry/projects/mailman/trunk/var/etc/mailman.cfg 

    (Note that you might need to run .tox/py35/bin/mailman info first, and of course you'll have to adjust the directories for your own local fork.)

    Now, in another shell, do the following:

    $ curl -v -i -u restadmin:restpass "http://localhost:8001/3.0/lists/list example.com"

    Note specifically that there is a space right before the "example.com" bit.

    Take note also that we're generating an HTTP/1.1 request as per curl default.

    The response you get is:

    Notice the lack of response headers, and thus the implicit 200 return even though the proper error code is in the body of the response. Why does this happen?

    Now look at http.server.BaseHTTPRequestHandler. The default_request_version is "HTTP/0.9". Given the request, you'd expect to see the version set to "HTTP/1.1", but that doesn't happen because the extra space messes up the request parsing. parse_request() splits the request line by spaces and when this happens, the wrong number of words shows up. We get 4 words, thus the 'else:' clause in parse_request() gets triggered. So far so good.

    This eventually leads us to send_error() and from there into send_response() with the error code (properly 400) and message. From there we get to .send_response_only() and tracing into this function shows you where things go wrong.

    send_response_only() has an explicit test on self.request_version != 'HTTP/0.9', in which case it adds nothing to the _header_buffer. Well, because the request parsing got the unexpected number of words, in fact request_version *is* the default HTTP/0.9. Thus the error headers are never added.

    There are several problems here. Why are the headers never added when the request is HTTP/0.9? (I haven't read the spec.) Also, since parse_request() isn't setting the request version to 'HTTP/1.1'. It should probably dig out the words[-1] and see if that looks like a version string.

    Clearly the request isn't properly escaped, but http.server should not be sending an implicit 200 when the request is bogus.

    vadmium commented 7 years ago

    I think you should be able to reproduce this without Mailman or tox, by just running “python -m http.server”.

    The problem is the “HTTP/0.9” protocol that Python is assuming does not include a header section, so there is no place to put a 400 status code or header fields. The HTTP 0.9 response is supposed to only be a HTML body; see \https://www.w3.org/Protocols/HTTP/AsImplemented.html\ and \https://tools.ietf.org/html/rfc1945#section-6\.

    I think we should drop HTTP 0.9 response support from Python’s HTTP server, as well as the attempted but buggy request support. But there was a bit of resistance; see bpo-10721.

    Another possibility would be to change default_request_version so that error responses are sent as HTTP 1.0. But there may be more fixes needed for this to continue the buggy HTTP 0.9 support; see bpo-26578.

    vadmium commented 7 years ago

    Parsing the version from words[-1] rather than words[2] may be a minor improvement however.

    warsaw commented 7 years ago

    On Oct 28, 2016, at 10:42 PM, Martin Panter wrote:

    I think you should be able to reproduce this without Mailman or tox, by just running “python -m http.server”.

    Yep, indeed.

    The problem is the “HTTP/0.9” protocol that Python is assuming does not include a header section, so there is no place to put a 400 status code or header fields. The HTTP 0.9 response is supposed to only be a HTML body; see \https://www.w3.org/Protocols/HTTP/AsImplemented.html\ and \https://tools.ietf.org/html/rfc1945#section-6\.

    I think we should drop HTTP 0.9 response support from Python’s HTTP server, as well as the attempted but buggy request support. But there was a bit of resistance; see bpo-10721.

    Another possibility would be to change default_request_version so that error responses are sent as HTTP 1.0. But there may be more fixes needed for this to continue the buggy HTTP 0.9 support; see bpo-26578.

    I think it may indeed make sense to set default_request_version to 1.0. It's probably too late to do that for 3.6, but I do think it makes sense for 3.7. I'm nosying on the issues you mentioned.

    I've proposed essentially that workaround for Mailman.

    https://gitlab.com/mailman/mailman/issues/288

    We can get away with requiring at least HTTP/1.1 for our REST clients.

    warsaw commented 7 years ago

    On Oct 28, 2016, at 10:57 PM, Martin Panter wrote:

    Parsing the version from words[-1] rather than words[2] may be a minor improvement however.

    Indeed; I thought about that too.

    vadmium commented 7 years ago

    Here is a patch to parse the version from the request in more cases. Since it is more of a cosmetic improvement for handling erroneous requests, I would probably only apply it to the next Python version (3.7 atm). Or do you think it should go into bug fix versions as well?

    warsaw commented 7 years ago

    On Oct 29, 2016, at 02:20 AM, Martin Panter wrote:

    Here is a patch to parse the version from the request in more cases. Since it is more of a cosmetic improvement for handling erroneous requests, I would probably only apply it to the next Python version (3.7 atm). Or do you think it should go into bug fix versions as well?

    That would be a RM decision.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 7 years ago

    New changeset 7c98768368cb by Martin Panter in branch 'default': Issue bpo-28548: Parse HTTP request version even if too many words received https://hg.python.org/cpython/rev/7c98768368cb

    vadmium commented 7 years ago

    Okay committed to 3.7 for the moment. I think that is all we can reasonably do unless we drop the pseudo-HTTP-0.9 support.