kyraseevers / Partitioning-visited-links-history

A proposal to partition :visited link history by top-level site and frame origin.
12 stars 1 forks source link

Rationale for using Frame Origin in Partition Key #7

Open sysrqb opened 1 month ago

sysrqb commented 1 month ago

As we consider moving toward triple-keyed partitions, some of those keys use an ancestor-chain bit while others use the frame origin. Can you describe the trade-off of choosing one over the other, and why frame origin is best in this case? Thanks

kyraseevers commented 1 month ago

Thanks for your question!

We chose the triple key containing frame-origin because it allows us to adhere to same-origin policy, preventing cross-origin frame leaks. (And we chose top-level site as it allows us to prevent cross-site tracking).

Ancestor Chain Bit or ACB can be helpful for APIs which can personalize the content of inner frames in a nested or A->B->A style scenario. This includes the APIs you are likely familiar with that interact with cookies and storage. To quote @arturjanc directly: "Without the ACB, A->B->A scenarios would allow B to embed resources from A authenticated with top-level A's credentials, thereby leaving them vulnerable to cross-site attacks from B."

Since :visited links does not personalize the content of a frame it is applied to, and no credentials are transferred from top-level to subframes, it is our belief that ACB was unnecessary in this model.

arturjanc commented 1 month ago

Like @kyraseevers said, it doesn't seem that in the case of visited links there would be much actual security/privacy benefit in including the ancestor-chain bit. The one advantage I can think of is that, if other mechanisms all include that bit when calculating the partition, there's some value in maintaining consistency across APIs. IIRC @annevk was interested in unifying this a couple of years ago -- but I'm not sure how close we can get to a truly unified partitioning scheme, given the diversity of the APIs that need to be partitioned, and the fact that we might want to intentionally make slightly different partitioning decisions based on performance, etc.

So my guess is that we have a strong opinion and could go either way. (But, by default, I'd probably still skip the bit unless we find cases where it has value.)

annevk commented 1 month ago

Let me try to unpack a bit since I always manage to confuse myself. Triple-key in this context means the key for the item itself (i.e., the visited URL), the top-level document site, and the document origin of the document the item was in, right?

I thought in some cases the document origin was going to be the document site, but perhaps that's no longer the case? And I am thinking about consistency here as we also had requests to make the partition more visible to web developers, e.g., through an HTTP request header or API. If we have many different flavors of partitions that will be a harder problem to solve and will also make it harder for web developers to develop an intuition as to what is happening. (And to a lesser extent for browser developers as to what needs to happen.)

Then finally there is the question as to whether A->B->A can result in an issue with the triple-key scheme as laid out above. I think there is some chance for confusion as the inner A will appear "personalized" by indicating visiting links, but is otherwise almost treated the same way B is (at least by default). Not sure I see a way to successfully exploit it though.


In terms of what we'd expose to web developers as a partition I suppose since https://github.com/w3c/webappsec-fetch-metadata/pull/89#issuecomment-2090386666 we have a more concrete idea and that does allow for making a different decision on a per-API basis at the moment. Where this API would mainly care about the relationship with the top, another API might take more of the chain into account. It's still not great for learning though.

arturjanc commented 1 month ago

Let me try to unpack a bit since I always manage to confuse myself. Triple-key in this context means the key for the item itself (i.e., the visited URL), the top-level document site, and the document origin of the document the item was in, right?

Yes, exactly.

I thought in some cases the document origin was going to be the document site, but perhaps that's no longer the case?

Yes, the triple-keyed HTTP cache in Chrome uses the document site rather than origin, because using the origin would prevent us from ever sharing the cache between sibling same-site origins, which seemed too costly from a performance perspective (though I don't know if we have actual data about the impact). But whenever possible, arguably the more principled solution is to use the document origin to prevent the state from leaking out to other origins within the site, which is what we went for here.

And I am thinking about consistency here as we also had requests to make the partition more visible to web developers, e.g., through an HTTP request header or API. If we have many different flavors of partitions that will be a harder problem to solve

This is true., The problem is that even in Chrome we already have different partitioning approaches for different kinds of partitioned state, e.g.

  1. HTTP cache uses (<schemeful top-level site>, <schemeful document site>, <is-cross-site navigation?>*, [item URL])
  2. Network state uses (<schemeful top-level site>, <is document cross-site with top-level?>, [item])
  3. :visited links, history, as proposed here, use (<schemeful top-level site>, <document origin>, [URL])
  4. Local state (e.g. localStorage, IndexedDB) is implicitly scoped to the origin, but also partitioned by schemeful top-level site and the ancestor chain bit.
  5. Cookies are their own can of worms. (But, in my mental model, the partition is, roughly,: the top-level site, the cookie scope, and the ancestor chain bit)

[*] This is a bit that's meant to make sure that cross-site navigations to a subresource (e.g. evil.com navigating to victim.com/data/[username].json) don't re-use the cache entry populated when the resource is used in a same-site context. Otherwise the timing of navigations can leak the presence of arbitrary entries in the cache. This isn't fully launched yet though, FWIW.

IMHO the challenge with unifying these is two-fold:

  1. The performance trade-offs are very different for these mechanisms, e.g. there are non-trivial regressions in web-wide metrics associated with fully triple-keying network state, but there is no performance impact associated with triple-keying history (even if we tighten the scope to document origin, rather than site).
  2. The bits which we need for partitioning some mechanisms don't really make sense for others. For example, the "is-cross-site-navigation" bit we have for HTTP cache wouldn't make sense for local storage mechanisms because that would be equivalent to SameSite=Strict cookies and you'd never have local state after a cross-site navigation.

So while I like the overall goal of making this simpler, I'm a bit skeptical that there is a single partitioning approach that would work across these mechanisms. We could likely create an abstraction layer and have each mechanism call into it somehow; we could probably also provide information about the partition in specific contexts where it makes sense (e.g. https://github.com/w3c/webappsec-fetch-metadata/pull/89#issuecomment-2090386666 is reasonable for exposing information about the document's relationship with its ancestor chain, which influences cookies and local storage) -- but I don't think this will be enough for us to tell developers: "this is your storage partition for everything".

arturjanc commented 1 month ago

I guess for the specific purposes of this issue, my opinion is that we likely shouldn't block on figuring out the broader, unified, partitioning story because that's going to be tricky and take a long time. If folks feel strongly that we should add the ancestor-chain bit to the history partition we can likely do it (practically, this will be okay for :visited either way) - but I feel like we'd largely be speculating about what key for :visited will be more consistent with what we eventually land on, which is a bit of a gamble. So, my instinct would be to keep it simple and only add the information we need to the history partitioning key, and then possibly update the key if/when we refactor all the partitioned APIs to make things consistent.