w3c / webappsec-clear-site-data

WebAppSec Clear Site Data
https://w3c.github.io/webappsec-clear-site-data/
Other
18 stars 25 forks source link

Clear a specific URL from cache #81

Open yoavweiss opened 2 months ago

yoavweiss commented 2 months ago

CDNs have the notion of "hold-till-told" cache semantics, where resources can have long term cache lifetimes, but can be purged once a certain event happens. (e.g. a typo or a pricing error is discovered)

It seems like such a notion would be useful on the web as well, even if the means of distributing a "purge" signal are different, and essentially client-driven.

One example that comes to mind that frequently changes HTML content is login state. Logged-in users get slightly different experiences compared to their previously-not-yet-logged-in selves on the same URLs.

It seems like it might be beneficial to enable sites to cache their HTML pages, while enabling them to purge them in case e.g. a user logs in.

While more expansive purging capabilities (e.g. the ones proposed in cache groups) can be easier to manage from a site author's perspective, a single URL purge may be significantly easier to implement, and still useful on its own.

/cc @tvereecke @mnot

mnot commented 2 months ago

Not sure logged in is the right example here -- that's best mirrored in cookie state (and thus Vary -- see eg availability hints here).

But the use case is interesting -- my go-to example here is 'what if we publish something and legal says it has to come down NOW'?

Are you thinking something like a protocol from server to browser, or just a new JS API? In the latter case, sites would need to write the protocol themselves, which means integrating it with CDN purges would potentially not be as seamless.

See also proposals like LCI, which can scale out to browser deployment, at the cost of a modest latency hit for invalidations (e.g., 5-30 secs). Of course, you could also do it with something like SSE.

yoavweiss commented 2 months ago

Not sure logged in is the right example here -- that's best mirrored in cookie state (and thus Vary -- see eg availability hints here).

Logged in state is not necessarily the only state reflected in cookies, so varying on them often results in very little cacheability.

But the use case is interesting -- my go-to example here is 'what if we publish something and legal says it has to come down NOW'?

Yup! Retractions, typos and price errors are all good reasons to purge caches, and the risk of them living in the browser cache means that HTML assets are often non-cacheable, even if they could be.

Are you thinking something like a protocol from server to browser, or just a new JS API?

I was thinking of expanding the Clear-Site-Data header so that developers can provide a URL value, and have that URL be purged from cache. A JS API may also be desired, e.g. if user actions mean that previously cached assets are now invalidated - but that wasn't my first thought for this.

mnot commented 2 months ago

Logged in state is not necessarily the only state reflected in cookies, so varying on them often results in very little cacheability.

Availability hints would fix that :)

I was thinking of expanding the Clear-Site-Data header so that developers can provide a URL value, and have that URL be purged from cache. A JS API may also be desired, e.g. if user actions mean that previously cached assets are now invalidated - but that wasn't my first thought for this.

That means they'd first have to purge from their CDN, then trigger the purge in browser caches, in two distinct steps. Not the end of the world, but not terribly elegant or integrated...

yoavweiss commented 2 months ago

That means they'd first have to purge from their CDN, then trigger the purge in browser caches, in two distinct steps. Not the end of the world, but not terribly elegant or integrated...

Could CDNs take Clear-Site-Data as a purging signal? I guess their semantics are not very clear (e.g. could be a private cache purge, rather than a public cache purge)