mozilla / standards-positions

https://mozilla.github.io/standards-positions/
Mozilla Public License 2.0
633 stars 69 forks source link

No-Vary-Search #717

Closed liviutinta closed 1 day ago

liviutinta commented 1 year ago

Request for Mozilla Position on an Emerging Web Specification

Other information

TAG Design Review: w3ctag/design-reviews#797 Webkit standards-positions issue: https://github.com/WebKit/standards-positions/issues/106

valenting commented 1 year ago

@liviutinta @domenic

I'm a bit curious how this would work in practice. Currently when we do a fetch request for [1]. http://example.com/?a=1&b=2 we use the URL as part of the cache key. Under no-vary-search, if the response comes back with a No-Vary-Search: params, except=("a") header, that means that future requests for: [2]. http://example.com/?a=1 [3]. http://example.com/?a=1&b=3 [2] and [3] Should be loaded from the cache, right? But what would the cache key for [1] be and what would the browser's strategy be to look up [2] and [3]? Would we key [1] as no-vary-search|http://example.com/ and for [2] and [3] also lookup a cache entry for the full URL but also for no-vary-search|{url-with-no-query} - then check if the response's No-Vary-Search header value makes it acceptable? Maybe you had a different keying strategy in mind, because this one adds an extra lookup for all requests that would otherwise be a cache miss.


The site has traded off nice URLs and accurate utm_source values, in favor of better cache utilization. If this is not desired, the site could avoid using No-Vary-Search on redirect responses, or it could fix up the user-facing URL using history.replaceState() shortly after loading. (The latter might be good practice anyway!)

I have a feeling this is going to cause some interesting issues later on - why does the URL contain ?user=John but the website say Hi Mike? I agree that the site could fix up the URL, but I'm wondering if the site's tech stack might be complex enough that it would be easy to miss that.


I'm also a bit worried that this header might be easy to misuse: Let's say someone only wants to apply the No-Vary-Search to example.com/static/ but instead applies it to everything. loading example.com/?pingbackto=John followed by example.com/?pingbackto=Mike would instead ping John twice. Depending on the contents of the "ping" the severity of this might vary.

domenic commented 1 year ago

Thanks for taking a look!

[2] and [3] Should be loaded from the cache, right?

Right.

But what would the cache key for [1] be and what would the browser's strategy be to look up [2] and [3]?

For the HTTP cache, @ricea has looked into this a bit for Chromium in this document. The summary is that it would be a nontrivial retrofit to make performant, as you gesture at.

For other caches, e.g. preloading caches, things are much easier, because they tend to contain much less than 100 entries total, so even just a linear scan is efficient enough. That's also where the most benefit lies in the near term, at least from Chromium's perspective.

For cases like intermediary HTTP caches, e.g. on a CDN, I'm not very experienced but I imagine the situation lies somewhere between the two.

I have a feeling this is going to cause some interesting issues later on - why does the URL contain ?user=John but the website say Hi Mike? I agree that the site could fix up the URL, but I'm wondering if the site's tech stack might be complex enough that it would be easy to miss that.

In general it seems like a bad idea to tell the browser to ignore a query parameter, if you're using it for something as important as greeting the user with their name. Or at least, if you're planning on using it that way, you do indeed need to invest more effort in compensating for what you've done.

(Somewhat related: the discussion about client-side rendering, and how No-Vary-Search is not a great fit for that since most client-side rendering varies based on the path instead of the query.)

I'm also a bit worried that this header might be easy to misuse: Let's say someone only wants to apply the No-Vary-Search to example.com/static/ but instead applies it to everything.

Yes, misconfiguring your server with wrong HTTP headers will generally lead to unfortunate results. I'd say that in this regard, No-Vary-Search slightly more damaging than typical caching HTTP headers like Vary, Cache-Control, etc., but less damaging than ones like Strict-Transport-Security, Content-Security-Policy, Clear-Site-Data, or Access-Control-Allow-Origin.

liviutinta commented 2 months ago

An update: No-Vary-Search header support for navigational prefetch shipped in Chrome 121, and will ship for prerender in Chrome 127.

Chrome status entries:

valenting commented 2 days ago

I think think this is a good proposal towards increasing the cache-hit rate in common internet scenarios.

My main concern was that a cache implementing No-Vary-Search would need to perform two lookups in its disk cache in order to retrieve a resource instead of one, even for websites that do not use No-Vary-Seach. However, it seems we're moving in that direction anyway with compression-dictionary .

I would like to see the following issue addressed in some way, but I don't think that's a blocker. No-Vary-Search producing multiple matches · Issue 210 · WICGnav-speculation

martinthomson commented 1 day ago

I've marked this as "positive", with a concern about the potential interoperability risk that comes from the multi-match thing.