whatwg / fetch

Fetch Standard
https://fetch.spec.whatwg.org/
Other
2.12k stars 332 forks source link

"get, decode, and split" doesn't return example output #1768

Closed CarloCannas closed 2 months ago

CarloCannas commented 3 months ago

What is the issue with the Fetch Standard?

https://fetch.spec.whatwg.org/commit-snapshots/4cb3cf21946113c0684f04122dd95315fd10c567/#header-value-get-decode-and-split

The algorithm doesn't seem to return the same output of the first example.

The example says that giving nosniff, as input we should get a list of two strings: "nosniff" and an empty string "".

If I try to follow the steps however I get a list of a single "nosniff" string and no empty string.

Here's how I'm interpreting it:

  1. Let input be the result of isomorphic decoding value.
  2. Let position be a position variable for input, initially pointing at the start of input.
  3. Let values be a list of strings, initially empty.
  4. Let temporaryValue be the empty string.
    input:   "nosniff,"
    position: ^
    values: []
    temporaryValue: ""
    1. While position is not past the end of input:
    2. Append the result of collecting a sequence of code points that are not U+0022 (") or U+002C (,) from input, given position, to temporaryValue.
      input:   "nosniff,"
      position:        ^
      values: []
      temporaryValue: "nosniff"
    1. If position is not past the end of input, then:
      1. ~If the code point at position within input is U+0022 ("), then:~
        1. ~Append the result of collecting an HTTP quoted string from input, given position, to temporaryValue.~
        2. ~If position is not past the end of input, then continue.~
      2. Otherwise:
        1. Assert: the code point at position within input is U+002C (,).
        2. Advance position by 1.
          input:   "nosniff,"
          position:         ^
          values: []
          temporaryValue: "nosniff"
  5. 2.

    1. Remove all HTTP tab or space from the start and end of temporaryValue.
    2. Append temporaryValue to values.
      input:   "nosniff,"
      position:         ^
      values: ["nosniff"]
      temporaryValue: ""
  6. 2.

    1. Set temporaryValue to the empty string.

Here position is past the end of input, the while condition is not satisfied and this returns a list with just a single element: ["nosniff"].

  1. Return values.

The examples seem right to me, I think the algorithm is wrong.

I found some WPT that indirectly test this algorithm (via Content-Length), Chrome and Firefox seem to follow the behavior of the example.

https://github.com/web-platform-tests/wpt/blob/merge_pr_47660/fetch/content-length/resources/content-lengths.json#L66-L69 https://wpt.fyi/results/fetch/content-length/parsing.window.html

I.e.: a header value with a comma as the last character is split into two values, as in the example. Since the header is Content-Length and the spec mandates to abort if the split values don't match, the test passes if the request results in a network error.

This should be the relevant Chrome implementation, it looks like it internally stores every header splitted https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_response_headers.cc;l=967-970;drc=82dff63dbf9db05e9274e11d9128af7b9f51ceaa;bpv=0;bpt=1 https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_util.cc;l=968;drc=82dff63dbf9db05e9274e11d9128af7b9f51ceaa

Firefox on the other hand doesn't seem to do that, the Content-Length check appears to be implemented using a simpler split by just commas, ignoring quotes. It makes sense: a quote char inside Content-Length already makes it invalid so there's no need to care about quotes while splitting. https://searchfox.org/mozilla-central/rev/53e68046298557fae0c922230b595bb6689bf587/netwerk/protocol/http/nsHttpHeaderArray.cpp#173-189 https://searchfox.org/mozilla-central/rev/53e68046298557fae0c922230b595bb6689bf587/netwerk/protocol/http/nsHttpHeaderArray.h#308

annevk commented 3 months ago

Thank you for finding and reporting this! I think I found a suitable fix and posted it in #1769.