tornadoweb / tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
http://www.tornadoweb.org/
Apache License 2.0
21.78k stars 5.51k forks source link

httputil.py parse_response_start_line() raise HTTPInputError when ResponseStartLine tuple missing "reason" key in response header #2231

Closed layer3switch closed 6 years ago

layer3switch commented 6 years ago

While most HTTP server response will provide "version", "code" and "reason" in response header string such as "HTTP/1.1 200 OK", it is also not necessary to include "OK" as reason key value. Some http servers will not include reason in its header response.

When "reason" key value is missing in the header, tornado.httputil.parse_response_start_line() will raise HTTPInputError and fails to parse rest of response header.

As shown below when "OK" is missing in the first header line;

>>> tornado.httputil.parse_response_start_line("HTTP/1.1 200")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/tornado/httputil.py", line 854, in parse_response_start_line
    raise HTTPInputError("Error parsing response start line")
tornado.httputil.HTTPInputError: Error parsing response start line

In parse_response_start_line() definition, re.match expects exactly 3 keys assigned from line string due to space as delimiter; https://github.com/tornadoweb/tornado/blob/c60c7f0ec4593602225a877a56b1defc6002addb/tornado/httputil.py#L853

My proposal to fix this, remove space between 2nd and 3rd match case.

    match = re.match("(HTTP/1.[0-9]) ([0-9]+)([^\r]*)", line)
layer3switch commented 6 years ago

Relevant pull-request #2233

bdarnell commented 6 years ago

The reason phrase may be empty, but the space before it is not optional. The spec (RFC 7230 section 3.1.2) is clear on this (confirmed on the http-wg mailing list). Any server which sends HTTP/1.1 200\r\n is not compliant with the spec and should be fixed.

See #857.

layer3switch commented 6 years ago

Pardon my comment on closed issue, but RFC 7230 3.2.1 clearly states;

A client SHOULD
   ignore the reason-phrase content.

So I don't want to jump to conclusion here, but are you suggesting removing reason phrase (match.group(3)) completely from return value for parse_response_start_line()? Rather than relax the client side check of server response as I've proposed?

bdarnell commented 6 years ago

Tornado already ignores the reason-phrase content. We make it available in case an application wants to inspect or log it (occasionally servers will use a non-standard response code and the reason code can be helpful in figuring out what went wrong), but other than that we just consume it and throw it away.

SideshowAlex commented 6 years ago

Would it be possible to revisit your decision to not act on this issue?

As it currently stands, I will not be able to use tornado in my CI setup, as we rely upon obtaining artifacts from a JFrog Artificatory, which doesn't follow RFC 7230 in this respect.

ploxiln commented 6 years ago

Does curl accept the missing space? Maybe you could work-around your issue by using the CurlAsyncHTTPClient instead of the SimpleAsyncHTTPClient?

SideshowAlex commented 6 years ago

@ploxiln - I am using tornado via saltstack. The call is abstracted away from me, so I don't have the opportunity to change what client is being used.

ploxiln commented 6 years ago

There must be a bit more to the story - if saltstack always uses tornado and artifactory always returns an http response line with no space after the code, then how does does the saltstack artifactory plugin ever work for anybody? https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.artifactory.html

ploxiln commented 6 years ago

After digging a little, I see that salt.modules.http uses salt.utils.http uses tornado simple_httpclient. But salt.modules.artifactory uses the python standard library urllib, probably for this reason.

SideshowAlex commented 6 years ago

Before Christmas, it was working, so an update to the Artifactory service may of happened to introduce this issue. It was reported in saltstack as https://github.com/saltstack/salt/issues/45208, but the solution to use urllib2 globally is not desirable.

bdarnell commented 6 years ago

@SideshowAlex Why are you reporting this to tornado and saltstack instead of to jfrog artifactory? Why not fix this in the component that is failing to comply with the relevant standards?

SideshowAlex commented 6 years ago

@bdarnell, I have reported it to jfrog (https://www.jfrog.com/jira/projects/RTFACT/issues/RTFACT-15629). As Artifactory is a closed source offering with an unknown delivery schedule, waiting for a fix from them is not something that I can work with.

I 100% agree that they their implementation is incorrect, and that this is where it should be fixed. As a developer, I understand why you would be hesitant in not making the change. On the other hand, as a developer, I appreciate when modules have some degree of robustness.

Feel free to disregard my request. If I need a fix, I can create a fork, and make the changes (with blackjack and hookers) 😄 .

layer3switch commented 6 years ago

I could be bias on this, but my 2 cents is that my previous proposed change doesn't break the RFC 7230 implementation in tornado but rather to meet the harsh reality of client-side parsing of first response line regardless non-conforming server-side RFC 7230 standard. Unfortunately @bdarnell seems to have taken a hardliner's stance on client-side parser acting as enforcer of server-side standard.

For @SideshowAlex, you can replace tornado http backend in saltstack with requests without any impact.

layer3switch commented 6 years ago

Without turning this into fruitless mental exercise, to be fair, this parser never checked for the standard spec RFC 7230 Section 3.1.2 properly.

https://github.com/tornadoweb/tornado/blob/c60c7f0ec4593602225a877a56b1defc6002addb/tornado/httputil.py#L853-L857

re.match 1st filter contains (HTTP/1.[0-9]) which violates ABNF rule format where HTTP-version is HTTP-version = HTTP-name "/" DIGIT "." DIGIT.

re.match 2nd filter contains ([0-9]+) which regex match can be any number of numeric digits where only 3 digits must be applied as written in RFC 7230 Section 3.1.2

The status-code element is a 3-digit integer code
...
        status-code    = 3DIGIT

Below re.match expression at least tries to match based on standard RFC spec while relaxing server-side response semantics.

    match = re.match("(HTTP/\d{1}.\d{1})(\s)(\d{3,})(\s?)([^\r]*)", line)
    if not match or len(match.group(3)) > 3:
        raise HTTPInputError("Error parsing response start line")
    elif match.group(4) == '':
       reason = ''
    else:
       reason = match.group(5)
    return ResponseStartLine(match.group(1), int(match.group(3)), reason)
bdarnell commented 6 years ago

I'm more lazy than a hardliner about this, especially given the removal of reason phrases from HTTP/2. But I don't want to bow to pressure here until you've applied at least as much pressure to jfrog.

As Artifactory is a closed source offering with an unknown delivery schedule

Sounds like you know more about Tornado's delivery schedule than I do :)

But as a closed source product, someone presumably gets paid to work on Artifactory, unlike Tornado. And based on this comment, it was a recently introduced bug, which raises hope for a swift fix. And even if this were changed in Tornado, we're still a couple of months away from the next Tornado release (5.0), and then saltstack may need to be updated for compatibility with 5.0, and even then there may be distribution concerns.