whatwg / fetch

Fetch Standard
https://fetch.spec.whatwg.org/
Other
2.1k stars 326 forks source link

redirects and etag/if-none-match #1770

Open wanderview opened 3 weeks ago

wanderview commented 3 weeks ago

What is the issue with the Fetch Standard?

@mrpickles and I were looking into how redirects and http caching work together recently. We noticed that redirects (301/302) don't seem to send if-none-match headers even if the redirect response contains an etag. This behavior seems consistent across chrome, firefox, and safari.

Does the spec currently reflect this? I can't seem to find where it defines this restriction. Step 25.2.2.2.1 of http-network-or-cache-fetch doesn't seem to check the status code of the stored response at all. I also don't see it in the http-caching RFC. Am I missing it somewhere or do we need to add this check to the spec?

For reference, I think this is where chromium restricts the status code:

https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_cache_transaction.cc;l=3029;drc=aeeec53b304d068aa96fe84816d1d6c9d96f454e

mnot commented 3 weeks ago

Just to clarify:

Given request reqA that receives a response resA, and resA contains a redirection status code and Location header pointing to B. You seem to be expecting reqB to contain If-None-Match header fields, but it's not clear where the Etags in them were obtained: is it from a stored response from B, or from resA?

If it's a stored response from B, I agree that the request should contain a conditional (although it is not required to by HTTP, at least). However, if it's from reqA, that's problematic; ETags obtained from A have nothing to do with B.

MrPickles commented 3 weeks ago

it's not clear where the Etags in them were obtained: is it from a stored response from B, or from resA?

This is the order of operations:

  1. You visit A. The response contains an Etag header and a Location header that points to B. The response has a redirection status code.
  2. You get redirected to B. The response from B doesn't matter in this case.
  3. You visit A again. There's no If-None-Match header in the request, even though the earlier response from A had an Etag header.

So the question here is whether it's appropriate for subsequent requests to A to contain the header since a previous response had an ETag, even though the status code is a redirect.

mnot commented 3 weeks ago

Ah, I see, thanks.

annevk commented 3 weeks ago

Okay so if HTTP-network-or-cache fetch step 8.25.1 gives us a non-200 response we should not be pretending we can revalidate.

I guess the Chromium code says it makes sense for 206 as well, but isn't a 304 always translated to a 200?

MrPickles commented 3 weeks ago

Okay so if HTTP-network-or-cache fetch step 8.25.1 gives us a non-200 response we should not be pretending we can revalidate.

That's my understanding, at least. (I suppose it's also plausible that the storedResponse variable not having any body contents could cause it to be considered null.)

I guess the Chromium code says it makes sense for 206 as well, but isn't a 304 always translated to a 200?

What does 304 translating to a 200 mean? From what I can tell, a 304 response causes the client to completely ignore the response body.

So 304 behavior in practice seems inconsistent among browsers. See https://cr.kungfoo.net/mrpickles/http_cache/res_304_with_200_first.php for a quick demo of 304 and ETag behavior.

MrPickles commented 3 weeks ago

Okay so if HTTP-network-or-cache fetch step 8.25.1 gives us a non-200 response we should not be pretending we can revalidate.

Does something like https://github.com/MrPickles/fetch/commit/92fee8d25c509b930cbe20541c470d679f6cfaac look to be on the right track? (This is my first time doing web standards stuff.) If so, I can open up a pull request.

annevk commented 3 weeks ago

What does 304 translating to a 200 mean?

We end up replacing a 304 network response with its stored response (so that a fetch caller almost never sees a 304, unless they control the revalidating headers themselves). We currently don't assert that the stored response has a 200 status code, but I think the HTTP specifications do end up requiring that. Might need to be further clarified as well. @mnot's input on this would be appreciated.

Does something like https://github.com/MrPickles/fetch/commit/92fee8d25c509b930cbe20541c470d679f6cfaac look to be on the right track?

That does look good, modulo whether we should check for "ok status", "200", or "200 or 206".

MrPickles commented 2 weeks ago

For reference, I think this is where chromium restricts the status code:

https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_cache_transaction.cc;l=3029;drc=aeeec53b304d068aa96fe84816d1d6c9d96f454e

There may be additional filtering elsewhere. From observation, the browser doesn't send If-None-Match headers if the original response returned an ETag but had a 206 status code.

Does something like https://github.com/MrPickles/fetch/commit/92fee8d25c509b930cbe20541c470d679f6cfaac look to be on the right track?

That does look good, modulo whether we should check for "ok status", "200", or "200 or 206".

Would it reasonable to start with specifying "ok status" and then tightening the spec as we gain more certainty around the intended behavior? Since step 8.25.2 currently specifies no restrictions, I'd imagine we'd want to start with the smallest reasonable "logical spec diff" and incrementally proceed from there.

annevk commented 2 weeks ago

Typically for changes to the standard we try to apply "measure twice cut once". As per https://whatwg.org/working-mode#changes we'll also need tests and from those we'd be able to inform ourselves as to what to require and which implementations might need changes. (This isn't always possible, but I think for this change it should be.)

MrPickles commented 1 week ago

Ack, sounds good.

I did a bit of poking around to see how the different browsers react (demo here). From what I'm seeing, the browser behaviors are inconsistent, and I haven't been able to pattern-match on what they actually do.

Let me get back to you on this once I've investigated a bit more and have a more rigorous description of each browser's behavior.

MrPickles commented 4 days ago

Alright, so I've documented some patterns from the three browsers. This is from making subsequent requests to the same URL that always returns an ETag but returns a random response status code.

Below is the behavior of each browser if you make multiple requests to the same URL. I can't believe I'm saying this, but it's easier to describe the existing behavior using (semi-)normative language...

Chrome

  1. Let etag be the browser's stored ETag value. By default, etag is null.
  2. Let sendIfNoneMatchHeaders be a boolean flag. By default, it is false.
  3. Let req be the HTTP request the browser is going to send.
  4. If sendIfNoneMatchHeaders is true and etag is not null, then:
    1. Set the If-None-Match headers of req to be etag.
  5. Send req to the URL.
  6. Let res be the HTTP response from req.
  7. Let status be the HTTP status code from res.
  8. Let responseEtag be the contents of the ETag header from res.
  9. If status is 200, then:
    1. If etag null, then:
      1. Set etag to be responseEtag.
    2. Set sendIfNoneMatchHeaders to true.
  10. If status is not 200 or 206, then:
    1. Set sendIfNoneMatchHeaders to true.
  11. Rinse and repeat.

So the way Chrome works is that it'll start storing ETags once it gets a 200. And it'll keep sending the ETag as long as it keeps getting 200's or 206's from the same URL. If the "chain gets broken" by a non-200/206, getting another 200 response will allow the browser to continue sending the original ETag.

Firefox

  1. Let etag be the browser's stored ETag value. By default, etag is null.
  2. Let neverSendIfNoneMatchHeadersEverAgain be a boolean flag. By default, it is false.
  3. Let req be the HTTP request the browser is going to send.
  4. If neverSendIfNoneMatchHeadersEverAgain is false and etag is not null, then:
    1. Set the If-None-Match headers of req to be etag.
  5. Send req to the URL.
  6. Let res be the HTTP response from req.
  7. Let status be the HTTP status code from res.
  8. Let responseEtag be the contents of the ETag header from res.
  9. If status is 2xx and etag is null, then:
    1. Set etag to be responseEtag.
  10. If status is not 2xx, then:
    1. Set neverSendIfNoneMatchHeadersEverAgain to true.
  11. Rinse and repeat.

Firefox seems to stop sending ETags indefinitely as soon as it receives a non-2xx from the server. But otherwise, it will save the ETag from a 2xx response and keep sending the ETag back as long as it has a 2xx response.

Safari

  1. Let etag be the browser's stored ETag value. By default, etag is null.
  2. Let req be the HTTP request the browser is going to send.
  3. If etag is not null, then:
    1. Set the If-None-Match headers of req to be etag.
  4. Send req to the URL.
  5. Let res be the HTTP response from req.
  6. Let responseEtag be the contents of the ETag header from res.
  7. Let shouldClearEtag to be whether (etag is not null).
  8. Set etag to be responseEtag.
  9. If shouldClearEtag, then:
    1. Set etag to null.
  10. Rinse and repeat.

Safari is weird. It doesn't care about the response code. Rather, it seems that for every other request, it clears the ETag.

MrPickles commented 4 days ago

The browser behaviors seem pretty inconsistent, so I'm at a loss for what the proper spec should be. Unless we keep the spec super vague, at least one browser will be violating the spec. How would you recommend proceeding?

wanderview commented 3 days ago

Any opinions? It seems like chrome and firefox are pretty close. We could try to sidebar about it at TPAC.