privacycg / proposals

New proposals in the Privacy Community Group
https://privacycg.github.io
122 stars 5 forks source link

Full storage partitioning / double-keying #4

Closed othermaciej closed 4 years ago

othermaciej commented 4 years ago

Many privacy measures have focused on cookies and replacements for them. However, many other storage types can, without appropriate restrictions, be used for stateful cross-site tracking.

This includes explicit storage types like LocalStorage or IndexedDB, implicit storage, like the HTTP cache, communication channels like ServiceWorker and BroadcastChannel, and subtle state like HSTS flags.

When such state is accessible from a third-party context, it may enable cross-site tracking. Some state like this, e.g. the HTTP cache, can effectively be read by a passive resource, and thus any third-party resource may be affected. Other such state requires a scripting context in a third-party origin and thus would require an iframe or similar mechanism.

For any given storage mechanism, multiple approaches are possible. One approach is to totally deny access to the storage mechanism in a third party context. Another is to partition or double-key it; that is, have a completely separate instance of the storage based on the origin of the top-level browsing context. Yet another is to expose a unique ephemeral storage area.

WebKit has long partitioned many storages, including (at one point controversially), the HTTP cache. Unfortunately, exactly what we did is not documented. Blink and Gecko also have work in progress to add pervasive double keying.

It would be useful to agree on a common behavior, and to push these changes into standards as requirements.

Changes along these lines would ultimately go into HTML, Fetch, and perhaps various IETF deliverables. Perhaps also other standalone Web APIs that create a storage mechanism. However, it could be useful to have a central location and issue tracker to develop a plan and proposed behavior before filing issues/PRs against the relevant specifications.

kdzwinel commented 4 years ago

This includes explicit storage types like LocalStorage or IndexedDB, implicit storage, like the HTTP cache, communication channels like ServiceWorker and BroadcastChannel, and subtle state like HSTS flags.

FWIW Chrome's Storage Isolation Project does a great job listing many (all?) state mechanisms in browsers: https://docs.google.com/document/d/1V8sFDCEYTXZmwKa_qWUfTVNAuBcPsu6FC0PhqMD6KKQ/edit#heading=h.5wyylz23hbkc

It would be useful to agree on a common behavior, and to push these changes into standards as requirements.

+1 Double keying of caches (and socket pools) being a requirement in e.g. Resource Timing API would help us avoid many privacy concerns (https://github.com/w3c/resource-timing/issues/222).

Changes along these lines would ultimately go into HTML, Fetch, and perhaps various IETF deliverables.

Double (and triple) keying of HTTP cache is already being discussed here: https://github.com/whatwg/fetch/issues/904 . As far as I understand though, this does not include any other state mechanisms.

ehsan commented 4 years ago

Mozilla has two implementations of storage partitioning, one shipped and one in the works:

We're also actively working on partitioning our HTTP cache and some related caches (per https://github.com/whatwg/fetch/issues/904).

Given our ongoing work in this area Mozilla is supportive of this work and agree on a common behaviour.

othermaciej commented 4 years ago

For avoidance of doubt, Apple also supports this work.

TanviHacks commented 4 years ago

Would anyone like to volunteer to be an editor for this?

othermaciej commented 4 years ago

If no one else can step up, I'm willing to make an Explainer that states the problem and surveys what different browser engines do for this currently. But I definitely will not be able to turn this into formal spec language once we agree on solutions.

annevk commented 4 years ago

Part of this seems like it should be part of storage access, right? That is, both are talking about changes to keying of site storage (loosely defined at https://storage.spec.whatwg.org/#infrastructure and probably in need of some changes, in particular for cookies). Now storage access might also allow for changes to the key, depending, but it seems best if that's sorted out together.

Thus far at Mozilla we've been thinking about separate "cache" and "storage" keys so the HTTP cache can always use multiple keys for instance and be static, whereas the "storage" key might be able to change depending on the storage access API. Unfortunately it doesn't seem to be quite that clean as while service workers and the cache API are logically "storage", putting them there is not necessarily great.

Also, some infrastructure work here is being done by @shivanigithub. See https://github.com/whatwg/html/pull/4966 and https://github.com/whatwg/fetch/pull/943 for an early adopter (which I should get to reviewing and I'd also welcome review on those from others here). Having access to the top-level origin will make defining these keys easier.

annevk commented 4 years ago

(I could see them being isolated if "storage access" only applies to a UA-determined set of sites and is about blocked storage -> storage and this tackles partitioning the set of sites that remain and "storage access" does nothing there, but it seems to early to make that kind of decision, right?)

othermaciej commented 4 years ago

They are related, but I think neither is a part of the other.

Storage partitioning and cache partitioning can clearly be implemented and/or specified without Storage Access API. Safari shipped a form of partitioning many years before we shipped Storage Access API.

Storage Access API's model is easier to explain if the specs require partitioning, and Storage Access API merely explains how it can be selectively undone. This can be done as two separate layers. But it is also reasonable to specify it as an API that undoes UA-specified partitioning or blocking, at least as an interim step.

Also worth noting, specifying partitioning is not sufficient as a mechanism to underpin current implementations of Storage Access API. The most essential thing SAA undoes in many browsers is third-party cookie blocking. While Safari briefly had selective third-party cookie partitioning, currently the measure ITP takes against cookies is fully backing them.

othermaciej commented 4 years ago

Also, I think it's reasonable to have logically multiple partitioning keys, of which Storage Access API undoes only a subset. Instead of storage/cache, the separation should probably be explicit/implicit. Cookies, LocalStorage and IndexedDB are all explicit storage APIs. HTTP Cache is not, but there are other things in that category that are worth partitioning. For example, if network-level state like TLS session state or Alt-Svc are partitioned, that should probably not be undone by Storage Access API.

(And thanks for the pointers to Google infrastructure/proposals in this area.

jkarlin commented 4 years ago

As noted above, I think we're dealing with two states. The default state is that the 3p storage is somehow sharded by key (either no key, an ephemeral key, a 1p key, or a double key). Where no key means the browser throws. The other state is one where the document is promoted via Storage Access API, and the key changes to the 1p key if it isn't already.

Perhaps this issue should focus on the first state, but understanding how it transitions to the second is useful.

othermaciej commented 4 years ago

Cookies are also a special case. I don't believe they are partitioned/sharded/double-keyed or whatever in any current browser, and I don't think any browser plans to do it in the future. The present is that many browsers block third-party cookies for some sites. I think a likely future is that all third-party cookies are totally blocked by default (both getting and setting).

jkarlin commented 4 years ago

Would it make sense to partition third-party cookies? If it makes sense to do so for localStorage I don't see why it wouldn't for cookies.

annevk commented 4 years ago

and Storage Access API merely explains how it can be selectively undone

So that would mean there's a transition from "partitioned" to "first-pary" storage, right? And unless the "partitioned" storage was ephemeral, the "first-party" would potentially get a lot of additional information?

Instead of storage/cache, the separation should probably be explicit/implicit.

A way I was thinking about is that caches can be appended to and storage can be manipulated. (I.e., you can delete a specific cookie or Indexed DB store, but you cannot delete a specific connection pool or session identifier.)

Cookies are also a special case.

I agree with regards to the status quo, but I wonder what principle backs that as data can flow between cookies and other storage APIs. HttpOnly is a difference, but cannot think of a suitable angle for that mattering here.

hober commented 4 years ago

Would anyone like to talk about this on this week's telcon? If so, please add the agenda+ label. (If you can't, let me know and I can add it.)

npdoty commented 4 years ago

For the security threat of private information disclosure from timing attacks by a first party based on the loading of specific resources from another site, there seems to be an analogy to Access-Control-Allow-Origin and related headers, where a server can indicate properties about the data that it's returning and whether it should be accessible by other origins. Should servers also be able to indicate whether caching could be sensitive (like search results pages)? Or is the assumption that all such timing attacks will be sensitive to some degree and we want to mitigate all of them rather than have servers indicate scope?

annevk commented 4 years ago

https://github.com/w3ctag/design-reviews/issues/424 has some additional context, but in general, we cannot trust servers to make decisions in the best interests of end users.

npdoty commented 4 years ago

Thanks for that context @annevk and that makes sense; I mostly just wanted to make sure the concept had been considered. And it might be especially tricky for some of these cases for the server to make those decisions.

If browsers are concerned about implementing completely separate caches for all requests, I do wonder if some of these server-provided headers would be useful places to start, especially to address known attacks on discovering authenticated content. I don't know if browser developers are being held back by interop, efficiency (bandwidth or speed) or just implementation, but it might be that Cache-Control: public and Access-Control-Allow-Origin: * resources don't need the cache separation as urgently as other resources.

hober commented 4 years ago

This has been adopted as a Work Item, with @annevk as editor, as of privacycg/privacycg.github.io@fe69d9a.

annevk commented 4 years ago

I put up an introduction proposal at https://github.com/privacycg/storage-partitioning/pull/1.