HTTP/1 parsing - Githubissues

annevk commented 3 years ago

HTTP/1 was poorly tested for error handling and as a result implementations are not interoperable. This lack of interoperability reaches Fetch at times. Where HTTP/1 itself takes no stance or implementations cannot follow it, it might make sense to document it in Fetch or at least document it in this issue.

status-line parsing: https://github.com/web-platform-tests/wpt/pull/5102
status-code parsing: https://github.com/whatwg/fetch/issues/1142
message parsing (in particular incomplete responses): https://github.com/whatwg/fetch/issues/472
CR parsing: https://github.com/web-platform-tests/wpt/pull/13524
0x00 parsing: https://github.com/whatwg/xhr/issues/165#issuecomment-380447687 (apart from Firefox and a bug in Safari this is pretty close; agreement here is network error)
field-value is "incorrect" (this might go beyond H/1, not sure): https://fetch.spec.whatwg.org/#concept-header-value (there is agreement on this, modulo 0x00 above)
Parsing of specific headers (this goes beyond H/1), such as Content-Type and Content-Length: https://github.com/whatwg/fetch/issues/814

cc @whatwg/http

(Previously there was https://wiki.whatwg.org/wiki/HTTP.)

MattMenke2 commented 3 years ago

Unfortunately, removing old behavior here is now pretty resource intensive, in terms of launch process in Chrome:

Add code to gather data, wait months for data.
Get feedback from other browsers [I don't think I've ever been in contact with a single Safari person]
Go through intent process (Tag review can probably be skipped, at least)
Usher through release process

As I'm not longer on the network team, I have limited time to work on this sort of stuff. That having been said, I certainly support adding tests and working towards both tighter parsing and unifying behavior here, and I can probably find time to address what I view as most concerning (which, here, is status code handling/overflow).

@kinu @yutakahirano: Kinuko, Yutaka, would your team be interested in pursuing this?

MattMenke2 commented 3 years ago

Another fun one: Chrome allows 4 bytes of random slop between requests, before the start of headers. I believe this was copied from FireFox of IE's behavior at the time. I don't believe all browsers currently do this, so it may be easier to remove. My main concern is servers sending 0-length compressed gzip bodies with "Content-Length: 0", which aren't actually of length 0. In this case, there's slop on the end, which I suspect this logic deals with.

JannisBush commented 6 months ago

Several other differences I discovered:

Space (or tab) between header-name and :, e.g., x-frame-options : DENY:
- HTTP spec says that servers MUST reject such responses and proxies must remove such whitespace. It does not say what clients should do.
- Firefox ignores only the invalid header
- Chromium and Safari ignore the space, even the invalid header gets parsed and in the example here: XFO is applied
NULL bytes in header names
- Invalid according to the spec (token) but unclear what clients should do
- Firefox and Safari only ignore the invalid header
- Chromium treats the whole response as invalid (network error), similar to NULL in header values
NULL in header values
- Agreement seems to be network error
- Tests do not test for all kind of inclusions: e.g., no tests for fetch where currently browser behavior diverges
Lone LF (0x0A) in responses
- Similar to lone CR
- Currently behavior diverges for both browsers and within browsers (fetch (simple vs complex) vs document vs subresource, ...)
- Behavior also depends on where the lone LF is encountered, adding a bunch of tests is probably helpful
- Probably related: https://github.com/whatwg/fetch/issues/472
Lone CR (0x0D) in responses
- Mostly tested in https://github.com/web-platform-tests/wpt/pull/13524
- Currently, only tests with fetch, it seems to be sometimes different to top-level, or subresource parsing

MattMenke2 commented 6 months ago

A couple thoughts (note that I'm no longer on Chrome's network team, though still care about this area).

Several other differences I discovered:

Space (or tab) between header-name and :, e.g., x-frame-options : DENY:

HTTP spec says that servers MUST reject such responses and proxies must remove such whitespace. It does not say what clients should do.

It would seem very weird if proxies remove it, but clients were expected to reject it. Then some sites would only work with a compliant browser connecting through a compliant proxy, but not with a compliant browser directly making the requests.

NULL bytes in header names

NULL in header values

Chrome currently rejects responses with NULLs anywhere in the headers (including in status text). Not claiming this is the correct behavior.

Other weirdness that may be worth thinking about, if you haven't already. Only know how Chrome handles these:

Chrome allows 4 bytes of slop before the "http" (including NULLs). I assume this is for buggy servers that send 0-byte compressed bodies when they should be sending nothing, but not positive of that. Think this was copied from another browser's behavior before Chrome was released.
Chrome allows truncated headers (Server closes socket after only sending partial headers - that is, no double terminating CRLF, or possibly no terminating CRLF at all). Chrome only does this for HTTP (not HTTPS), for security reasons, since truncating an http-only or whatnot from a Set-Cookie header is a security issue.

whatwg / fetch

HTTP/1 parsing #1156