whatwg / fetch

Fetch Standard
https://fetch.spec.whatwg.org/
Other
2.11k stars 331 forks source link

HTTP/1 parsing #1156

Open annevk opened 3 years ago

annevk commented 3 years ago

HTTP/1 was poorly tested for error handling and as a result implementations are not interoperable. This lack of interoperability reaches Fetch at times. Where HTTP/1 itself takes no stance or implementations cannot follow it, it might make sense to document it in Fetch or at least document it in this issue.

cc @whatwg/http

(Previously there was https://wiki.whatwg.org/wiki/HTTP.)

MattMenke2 commented 3 years ago

Unfortunately, removing old behavior here is now pretty resource intensive, in terms of launch process in Chrome:

As I'm not longer on the network team, I have limited time to work on this sort of stuff. That having been said, I certainly support adding tests and working towards both tighter parsing and unifying behavior here, and I can probably find time to address what I view as most concerning (which, here, is status code handling/overflow).

@kinu @yutakahirano: Kinuko, Yutaka, would your team be interested in pursuing this?

MattMenke2 commented 3 years ago

Another fun one: Chrome allows 4 bytes of random slop between requests, before the start of headers. I believe this was copied from FireFox of IE's behavior at the time. I don't believe all browsers currently do this, so it may be easier to remove. My main concern is servers sending 0-length compressed gzip bodies with "Content-Length: 0", which aren't actually of length 0. In this case, there's slop on the end, which I suspect this logic deals with.

JannisBush commented 6 months ago

Several other differences I discovered:

  1. Space (or tab) between header-name and :, e.g., x-frame-options : DENY:
    • HTTP spec says that servers MUST reject such responses and proxies must remove such whitespace. It does not say what clients should do.
    • Firefox ignores only the invalid header
    • Chromium and Safari ignore the space, even the invalid header gets parsed and in the example here: XFO is applied
  2. NULL bytes in header names
    • Invalid according to the spec (token) but unclear what clients should do
    • Firefox and Safari only ignore the invalid header
    • Chromium treats the whole response as invalid (network error), similar to NULL in header values
  3. NULL in header values
    • Agreement seems to be network error
    • Tests do not test for all kind of inclusions: e.g., no tests for fetch where currently browser behavior diverges
  4. Lone LF (0x0A) in responses
    • Similar to lone CR
    • Currently behavior diverges for both browsers and within browsers (fetch (simple vs complex) vs document vs subresource, ...)
    • Behavior also depends on where the lone LF is encountered, adding a bunch of tests is probably helpful
    • Probably related: https://github.com/whatwg/fetch/issues/472
  5. Lone CR (0x0D) in responses
MattMenke2 commented 6 months ago

A couple thoughts (note that I'm no longer on Chrome's network team, though still care about this area).

Several other differences I discovered:

  1. Space (or tab) between header-name and :, e.g., x-frame-options : DENY:
  • HTTP spec says that servers MUST reject such responses and proxies must remove such whitespace. It does not say what clients should do.

It would seem very weird if proxies remove it, but clients were expected to reject it. Then some sites would only work with a compliant browser connecting through a compliant proxy, but not with a compliant browser directly making the requests.

  1. NULL bytes in header names
  2. NULL in header values

Chrome currently rejects responses with NULLs anywhere in the headers (including in status text). Not claiming this is the correct behavior.

Other weirdness that may be worth thinking about, if you haven't already. Only know how Chrome handles these: