whatwg / fetch

Fetch Standard
https://fetch.spec.whatwg.org/
Other
2.11k stars 329 forks source link

Spec unexpectedly requires caching 30x responses themselves — rather than caching the result of running HTTP-redirect fetch to follow the 30x redirects #1765

Closed sideshowbarker closed 2 months ago

sideshowbarker commented 2 months ago

What is the issue with the Fetch Standard?

The HTTP fetch algorithm at https://fetch.spec.whatwg.org/#ref-for-concept-http-network-or-cache-fetch says:

Set response and internalResponse to the result of running HTTP-network-or-cache fetch given fetchParams

.…where HTTP-network-or-cache fetch ends up doing https://fetch.spec.whatwg.org/#ref-for-response.cacheability, which says:

Store httpRequest and forwardResponse in httpCache, as per the "Storing Responses in Caches" chapter of HTTP Caching. [HTTP-CACHING]

But, that happens before a later HTTP fetch step at https://fetch.spec.whatwg.org/#ref-for-redirect-status②, that says:

If internalResponse’s status is a redirect status:… Set response to the result of running HTTP-redirect fetch given fetchParams and response.

So the effect of that order of operations (caching before redirecting) seems to be: The spec unexpectedly requires caching 30x responses themselves — rather than caching the result of running HTTP-redirect fetch to follow the 30x redirects.

The Fetch implementation in Ladybird very closely follows the spec, and so following the spec seems to cause the bug reported at https://github.com/LadybirdBrowser/ladybird/issues/863, which ends up in the 30x responses themselves getting cached.

And the workaround that’s been implemented in Ladybird for now was to add some ad-hoc non-spec-conforming code for special-casing 30x responses so they don’t get cached where the Fetch spec currently seems to require them to be.

Anyway, as far as I can see, no other existing UAs actually implement the caching requirements for 30x responses in the spec as currently written — because if other UAs did, then it seems like the observable behavior in those UAs would match what was observed in Ladybird prior to the https://github.com/LadybirdBrowser/ladybird/pull/899/files workaround.

annevk commented 2 months ago

As far as I know redirect responses are cacheable. I'm also not sure why that would result in the redirect not getting followed.

sideshowbarker commented 2 months ago

As summarized in the issue description, the only response caching that the Fetch spec defines is defined as occurring prior to HTTP-redirect fetch being run.

And Ladybird doesn’t implement any caching of its own separate from what the Fetch spec defines.

I guess other UAs do some other caching of their own outside of what the Fetch spec defines, and they do it after HTTP-redirect fetch is run. And that’s fine, because I guess any UA is free to do HTTP response caching whenever and wherever they want — and because other UA’s HTTP caching behavior was implemented long before they implemented Fetch.

But the Fetch spec doesn’t define anything about how HTTP caching should work after is HTTP-redirect fetch being run. Instead the spec only defines HTTP caching prior to HTTP-redirect fetch being run.

And Ladybird has never had any HTTP caching implemented separate from the the Fetch spec requires — instead, Ladybird strictly implements only what the Fetch spec explicitly requires, and exactly in the order the spec requires it.

sideshowbarker commented 2 months ago

As far as I know redirect responses are cacheable. I'm also not sure why that would result in the redirect not getting followed.

OK, I guess I see now what you’re saying: The UA is intentionally supposed to cache each 30x response, and then re-follow that redirect every time it re-loads that 30x response from its cache?

annevk commented 2 months ago

Yeah. I'm not sure why the a redirect would not be followed when it comes from the cache. It's still a response with a redirect status as far as Fetch is concerned and will be processed as any other response.

sideshowbarker commented 2 months ago

Why not instead have the spec instead require caching of the result of the HTTP-redirect fetch?

It’s not clear at all to me what value there would be for a UA to cache the 30x redirect responses themselves, if the UA is every time just going to load the 30x and then right away make a new request for the resource/response it redirects to?

I mean, I realize the UA will (should) likely also have a cached response for the resource the 30x redirects to. But the required spec behavior still results in an additional request being made, for every 30x redirect, every time.

annevk commented 2 months ago

If I'm understanding correctly what you're suggesting then the request URL would not be updated and a bunch of security checks would not end up happening?

Furthermore, a temporary redirect in particular might also change what it redirects to pretty frequently, more frequently than any final response might change.

There is probably some further short-circuiting possible for permanent redirects, but it would have to be build on top of what we have today I think as we would still have to update the request URL and things like that. And it's not entirely clear what the pay-off for the specification would be to describe such a mechanism.

dotnetCarpenter commented 2 months ago

If a 301 or 308 response is given, then the UA are expected to update its stored link to the new URI.

However, I'm very unsure what the phrase "engaged in authoring content" is refering to:

However, this suggestion is usually ignored unless the user agent is actively editing references (e.g., engaged in authoring content)

Spec: https://httpwg.org/specs/rfc9110.html#status.301 Easier explanation: https://http.dev/301

Are there any reason why the fetch spec and HTTP spec diverge on updating the URI?

Note, that the HTTP spec does not say to change the URI indefinitely. But only according to Section 4.2.2 - Heuristic Freshness.


EDIT: If the redirect is not permanent, the web server SHOULD use 307 Temporary Redirect.

sideshowbarker commented 2 months ago

I’m going ahead and closing this, because after reading up, I have a better understanding than I did when I filed this — to the point where I would not have ended up raising this to begin with, if I had understand better.

If I'm understanding correctly what you're suggesting then the request URL would not be updated and a bunch of security checks would not end up happening?

I didn’t understand enough to know that would be a consequence.

Furthermore, a temporary redirect in particular might also change what it redirects to pretty frequently, more frequently than any final response might change.

There is probably some further short-circuiting possible for permanent redirects, but it would have to be build on top of what we have today I think as we would still have to update the request URL and things like that. And it's not entirely clear what the pay-off for the specification would be to describe such a mechanism.

Agreed on all those points. I don’t now see any need for a spec change here.

sideshowbarker commented 2 months ago

@dotnetCarpenter I closed this not because I disagree with anything in your comment, but because it seems like what you’ve commented on would be better handled instead though a separate issue that you could raise