privacycg / storage-access

The Storage Access API
https://privacycg.github.io/storage-access/
199 stars 26 forks source link

Consider restricting storage access scope back to per-frame #122

Closed johannhof closed 1 year ago

johannhof commented 1 year ago

We've had a lot of discussions in other places, e.g. at TPAC in the past weeks that focused on how we can improve interop, the API design and security properties of the Storage Access API, in light of recent publications such as rSAFor and new security considerations for SAA.

One central idea that stuck with some of us is to reconsider some of the design choices that were made for rSA over time, particularly to revisit per-page storage access and move back to per-frame (I think @martinthomson brought this up first). There are various reasons why I think this is a good idea:

A few, probably incomplete, thoughts on how this would work:

cc @annevk @bvandersloot-mozilla @mreichhoff @cfredric @johnwilander

annevk commented 1 year ago

One thing I noted in an offline conversation is that we'd have to figure out how workers (all of them) end up working if we do this.

Also, given the threat model #113 outlines, how did prior "per-frame" actually work? If document A embeds document B which embeds document C. And B succeeds with rSA. And then C fetches a resource B1. Does that fetch include credentials?

bvandersloot-mozilla commented 1 year ago

Ditto to Anne's question, with the additional thought experiment: document A embeds document B embeds document B* which fetches a resource B1. Does this fetch include credentials?

johannhof commented 1 year ago

I don't remember tbh and it seems like this is up to our own preference, we should really write tests for these cases though. Even the simple B embeds B1 case isn't 100% defined AFAICT.

From a security point of view it seems safest to say that cross-site ancestor break "implicit" storage access (but same-site iframe chains preserve it?).

cfredric commented 1 year ago

I've been thinking about this and getting ready to write a PR, and I wanted to flesh out the conversation a bit around the edge cases - especially navigations and session history (I haven't thought about workers yet).

The way I see it, there are 2 obvious candidate designs for achieving "per-frame" semantics:

  1. Add some state to the BrowsingContext that corresponds to the frame.
  2. Add some state to the Document that is being shown in the frame.

This state will have to be plumbed around to the same places in order to influence whether cookies are actually accessible during a fetch/JS execution (regardless of where the state is stored), so the choice only gets interesting when we think about the lifetime of that data, how/when it's reset, etc.

If we say the state should be on Document (or somewhere that's 1:1 with Document, whatever), then:

If we say that the state should be on the BrowsingContext, then:

So, all this is to say that I'd rather put the new state on Document, rather than on BrowsingContext. This choice leaves a pain point, namely that there's no way to give a document storage access while it is loading (before script execution). I think we can solve that by using a response header (maybe a new Supports-Loading-Mode variant, or a new boolean-valued header) that indicates "give this doc storage access, if a grant already exists". That still requires per-document opt-in so it fits with the security model, and it enables the "I need to load credentialed subresources" use case in a more consistent (and less annoying) way than the load-then-requestStorageAccess-then-reload workflow.

So, before I take a stab at writing a PR - does using Document (instead of BrowsingContext) sound reasonable to folks?

CC @arturjanc

annevk commented 1 year ago

Requiring requestStorageAccess() for each document indeed has some cost for certain navigation flows. I don't know what the state of current implementations is, but I imagine that might be hard to roll out successfully. I thought there was some past discussion about this, but having looked through past issues I can't seem to find it.

I think I'd like to see a lot more detail about the "response header" idea before it goes to a PR.

bvandersloot-mozilla commented 1 year ago

I think I'd like to see a lot more detail about the "response header" idea before it goes to a PR.

Me as well.

Playing the advocate of putting it on the BrowsingContext:

Same-origin navigations may have implicit storage access during the initial load, which is nice for site developers, but could be bad from a security standpoint: if an attacker can navigate a victim frame (with storage access) to a sensitive endpoint, that new endpoint will have storage access before it opts in. (I'm know I'm hand-waving here; I'm not great as an exploit developer, but this just feels like it goes against the security goals.)

I don't think partitioning should be relied upon as a security boundary in this way. If that were the case, a user-granted permission would be able to break the boundary. The sensitive document/endpoint should use SameSite cookies, frame-ancestors, or some other protection to keep it from being used in an inappropriate context.

If the frame refreshes, it'll have implicit storage access during the load. [...] If the top-level document refreshes, however, all the child BrowsingContexts are wiped out, so none of the frames get implicit storage access during the refresh. This feels like a sharp corner for site devs, since it makes the behavior less predictable: if their frame is refreshed at one level, things work, but if any of the parent documents refresh, their frame doesn't work anymore.

I don't think this is that sharp a corner, personally. The dev still has to deal with two primary cases: loading the document without initial storage access and with initial storage access. An outer document refresh is identical to an initial navigation in this case, which must be supported.

IIUC (and I may not, please correct me), the HTML spec reuses BrowsingContexts during cross-site navigations. [...] Importantly, the browser has "failed open" due to the bug. [...] (The same argument applies to any new security feature that impacts cookie access: the feature implementations need to explicitly opt out of inheriting the implicit storage access, which feels to me like the wrong approach for security.)

I would argue for changing the definition of a navigation to clear this state unless a set of conditions are met, rather than enumerating the cases where it would be cleared. This seems more safe and apt to "fail closed," although I concede that it is some, but less, risk that a new feature is created with a value that is a necessary condition for safety that is not added to this set of conditions.

bvandersloot-mozilla commented 1 year ago

Same-origin navigations may have implicit storage access during the initial load, which is nice for site developers, but could be bad from a security standpoint: if an attacker can navigate a victim frame (with storage access) to a sensitive endpoint, that new endpoint will have storage access before it opts in. (I'm know I'm hand-waving here; I'm not great as an exploit developer, but this just feels like it goes against the security goals.)

I don't think partitioning should be relied upon as a security boundary in this way. If that were the case, a user-granted permission would be able to break the boundary. The sensitive document/endpoint should use SameSite cookies, frame-ancestors, or some other protection to keep it from being used in an inappropriate context.

I recognize there is subtlety here that ties into discussion on #113, particularly here. To articulate more clearly: I don't think partitioning should be used as a document-to-document boundary. We can retain the same-origin policy protections by clearing this bit on cross-origin navigations.

cfredric commented 1 year ago

Requiring requestStorageAccess() for each document indeed has some cost for certain navigation flows. I don't know what the state of current implementations is, but I imagine that might be hard to roll out successfully. I thought there was some past discussion about this, but having looked through past issues I can't seem to find it.

Out of curiosity, do you (or @bvandersloot-mozilla) have any usage metrics on how often those navigation flows are hit and reuse some existing storage access grant?

To articulate more clearly: I don't think partitioning should be used as a document-to-document boundary. We can retain the same-origin policy protections by clearing this bit on cross-origin navigations.

Fair point. I admit that while thinking about this, I was assuming a future in which we also relax the permission key to <top level site, embedded site>, which arguably would be safe as long as each frame is required to actively opt in to using that permission. So in these attacks I was also thinking about implicit storage access on same-site, cross-origin navigations. If we clear the state for all cross-origin navigations (thereby removing implicit storage access for a same-site navigation, but still allowing promptless reuse of an existing grant), that solves the security issue - or at least restores the origin as the security boundary, which is consistent with most of the rest of the web platform. That definitely feels better to me.

If the frame refreshes, it'll have implicit storage access during the load. [...] If the top-level document refreshes, however, all the child BrowsingContexts are wiped out, so none of the frames get implicit storage access during the refresh. This feels like a sharp corner for site devs, since it makes the behavior less predictable: if their frame is refreshed at one level, things work, but if any of the parent documents refresh, their frame doesn't work anymore.

I don't think this is that sharp a corner, personally. The dev still has to deal with two primary cases: loading the document without initial storage access and with initial storage access. An outer document refresh is identical to an initial navigation in this case, which must be supported.

Part of my worry here is about the composability of authenticated embeds: Suppose A embeds B which embeds C, and B and C are both authenticated embeds. A refreshes, which wipes out the implicit storage access state for B and C. After loading, B's script detects that it doesn't have storage access, so it calls requestStorageAccess(), then refreshes the frame. (C's script has meanwhile done the same thing, or has at least started to.) B finishes loading with storage access. C finishes loading, but since B refreshed, it didn't get to load with storage access, as its state was wiped out again. So C's script calls requestStorageAccess() a third time, and reloads the frame. The page is finally ready and working properly, after the 3 documents were loaded up to 6 times.

I admittedly don't have a particular use case in mind that would run into this cascading refresh problem, so maybe it's academic. But regardless, I wanted to avoid imposing a penalty on composition, given that composability is a key feature of the web. (Perhaps this is something that a response header could fix in the future, and doesn't have to be fixed now.)

bvandersloot-mozilla commented 1 year ago

[...] The page is finally ready and working properly, after the 3 documents were loaded up to 6 times.

Ouch, I didn't think of that case. That is a sharp corner. I wouldn't write it off as academic outright. There are better choices the developer could make here with the bit on the BrowsingContext, but I don't know if it would be obvious enough to not do it this way.

cfredric commented 1 year ago

Thanks for the discussion in the editors' meeting yesterday. To reiterate some of that discussion here:

One point I wanted to revisit is that of "clearing" state on cross-origin navigations. Anne made the point that the state we're storing could be some type of lookup map (keyed by origin). Given that the state is keyed by origin, I don't think we actually need to clear an origin's state in the BrowsingContext when it does a cross-origin navigation; that way if the user navigates back, or visits a different document on that origin in that BC, they'll still benefit from the implicit storage access. (There's a security subtlety here, where an attacker could navigate a BC to some sensitive origin and load it with implicit storage access iff that BC had already visited that origin and gotten storage access; but this adheres to the SOP, so I think it's ok.) So, this map will grow monotonically over the lifetime of a BC.

Assuming that summary sounds accurate to you all, I'll continue reading and thinking about how to spec this. (I have yet to look into the Fetch spec, and figure out how workers relate to all of this.)

arturjanc commented 1 year ago

I don't have a strong opinion on storing state in the BrowsingContext vs. Document, but I think that, as described, the BrowsingContext approach has some undesirable security properties. Specifically, the embedder's ability to navigate an iframe that was granted storage access to arbitrary same-origin endpoints with credentials re-enables a number of attacks which are prevented by default after the removal of third-party cookies; this includes:

Note that these attacks are possible today in browsers with third-party cookies if a site sets its auth cookies as SameSite=None and doesn't protect itself against embedding by using X-Frame-Options/frame-ancestors. But we should think about the long-term stable state of the platform where the attacks mentioned above are prevented by default; we don't want the Storage Access API to be a footgun that re-enables these attacks, especially considering that embedding is allowed by default in the platform and doesn't require explicit opt-in.

There's a security subtlety here, where an attacker could navigate a BC to some sensitive origin and load it with implicit storage access iff that BC had already visited that origin and gotten storage access; but this adheres to the SOP

It seems like a bit of a stretch to use the SOP as an argument for allowing a storage access grant for one document to permit authenticated embedding of an unrelated same-origin document (the SOP generally gives documents with script execution capabilities access to all of the origin's data, but it doesn't make all security mechanisms origin-scoped).

A better analogy might be if the CORS Access-Control-Allow-Credentials: true response header on one endpoint allowed credentialed access to other endpoints in the same origin, which I think is fairly clearly not what we'd want.

Overall, I don't want to push back on the BrowsingContext-based approach because it seems like it has reasonable ergonomic benefits for developers, but I'd like us to try to not make the Storage Access API recreate the problem of isolation bypasses for origins that use it. For example, if we attach state to the BrowsingContext, could we make it so that cross-BC navigations of the BrowsingContext (e.g. the embedder directly navigating the iframe) would clear the storage access permission, or something along these lines?

annevk commented 1 year ago

@arturjanc note that SameSite=None cookies are also vulnerable to A1 -> B -> A2 embedding scenarios. I like the idea of resetting state when the browsing context (or maybe navigable) was tampered with from the outside.

cfredric commented 1 year ago

note that SameSite=None cookies are also vulnerable to A1 -> B -> A2 embedding scenarios.

I think we can simplify the attack model by removing the navigation to/from B, so it's just A1 -> A2. An attacker could accomplish this via clickjacking.

I like the idea of resetting state when the browsing context (or maybe navigable) was tampered with from the outside.

Does the web platform have a way of distinguishing which same-origin navigations should be considered "safe", and which should have their BC state cleared? I.e., what's the definition of "was tampered with from the outside"?

I think there's no (100% reliable) way to distinguish a "legitimate" A1 -> A2 navigation from one that was caused by an attacker, since they both may be caused by user activity (knowingly or not). So we can't say (e.g.) that it's safe to not clear state if the navigation was caused by a user's click, rather than by a script running in some other frame.

If clickjacking is in the threat model, then it seems to me that all cross-document navigations are potentially unsafe (even if they're same-origin), and we would have to clear the state in those cases. So we could only reuse the BC state during refreshes or navigations to the same URL.

annevk commented 1 year ago

@cfredric with embedding I meant A1 frames B which frames A2 (to attack it). The As are same-site and the B is cross-site. (Sorry for assuming familiarity with the abbreviated description.)

Does the web platform have a way of distinguishing which same-origin navigations should be considered "safe", and which should have their BC state cleared?

Yeah kinda, you can tell if your navigable was responsible for the navigation or if another navigable instructed you to navigate. In the latter case you could assume malice for the purposes of this feature. (This isn't just about clickjacking, it's also just about making a privileged navigation, which doesn't necessarily require a click.)

cfredric commented 1 year ago

you can tell if your navigable was responsible for the navigation or if another navigable instructed you to navigate

Sorry, I'm having trouble finding this in the spec. Can you point me to it?

This isn't just about clickjacking, it's also just about making a privileged navigation, which doesn't necessarily require a click.

Right, it's about any navigation IIUC. The reason I'm using clickjacking as an example is that by definition, it means the user has clicked on something that caused the same-origin navigation, rather than the attacker's script having directly initiated the navigation. So that looks the same to the user agent as "legitimate"/trustworthy navigations, IIUC, which makes it difficult to know when it's actually safe to skip clearing the state.

annevk commented 1 year ago

sourceDocument argument of https://html.spec.whatwg.org/#navigate. (This used to be a browsing context but that was factually wrong as browsing contexts (now mostly navigables) have no authority and the argument was used for authority-related decisions. For this decision interestingly enough we do mainly care about the navigables involved I think, but you can get to one from a document so that's all good.)

(And yeah, such a check won't help with XSS, but it will help with the scenarios @arturjanc outlined.)