Consider restricting storage access scope back to per-frame

privacycg / storage-access

The Storage Access API

https://privacycg.github.io/storage-access/

199 stars 26 forks source link

Consider restricting storage access scope back to per-frame #122

Closed johannhof closed 1 year ago

johannhof commented 1 year ago

We've had a lot of discussions in other places, e.g. at TPAC in the past weeks that focused on how we can improve interop, the API design and security properties of the Storage Access API, in light of recent publications such as rSAFor and new security considerations for SAA.

One central idea that stuck with some of us is to reconsider some of the design choices that were made for rSA over time, particularly to revisit per-page storage access and move back to per-frame (I think @martinthomson brought this up first). There are various reasons why I think this is a good idea:

As noted in #113, security considerations from per-page storage access would likely motivate us to enact restrictions on rSA that would reduce its utility for the per-page use case anyway, unless some opt-in signal from the first and/or third party is given. Most of these restrictions would no longer be necessary with per-frame
Per-page storage access vastly extends on the API target use case of "authenticated embeds", and most of us would like to focus on this use case for rSA.
Furthermore, the rSA API is actually quite inconvenient for providing per-page storage access as a compatibility measure. Sites that would not previously use iframes for their 3P resources would have to build weird constructs that "trick" users into clicking an iframe to open up storage access for e.g. images.
The new per-frame design for rSA I'm imagining would ensure that adjacent same-origin iframes would retain the ability to get promptless storage access after an initial user grant. This may be even more convenient for developers than working around the restrictions suggested in #113.
With rSAFor, we have a successor proposal for a page-wide "compatibility mode" API that we could work on and evolve with that specific intent and design goal, giving both APIs and use cases a cleaner separation.
As a neat side effect, we would resolve almost all of the remaining interop challenges through aligning the spec and browsers with roughly what is shipping in Safari.

A few, probably incomplete, thoughts on how this would work:

Browsers store a (top-level site, embedded origin) keyed permission (#121) that designates whether a user (agent) made the choice to allow or disallow storage access for that combination. Note that this does not grant implicit storage access as Firefox and Chromium would do right now, but instead signals that calls to rSA with that key would be auto-granted (see below).
Any iframe document that wants to gain storage access needs to call requestStorageAccess. This would be auto-granted when the permission is present, i.e. the user had made the previous choice to allow (how long that permission is stored is implementation-defined). The storage access grant is designated through a flag on the browsing context and thus applies to future same-site navigations inside the same iframe. I believe that this is very close to what Safari does today.
This means that when one iframe requests storage access, adjacent iframes will not get access automatically, but will be able to call rSA to get promptless access. As a result of this change I think we should be motivated to investigate further into #55 and I believe that integration with the Permissions API should give us that possibility through observing the permission.
Most other ideas such as maintaining storage access through continued user interaction as planned in #5 or potential future bucket-based localstorage access (#102) would still be compatible with this plan.

cc @annevk @bvandersloot-mozilla @mreichhoff @cfredric @johnwilander

annevk commented 1 year ago

One thing I noted in an offline conversation is that we'd have to figure out how workers (all of them) end up working if we do this.

Also, given the threat model #113 outlines, how did prior "per-frame" actually work? If document A embeds document B which embeds document C. And B succeeds with rSA. And then C fetches a resource B1. Does that fetch include credentials?

bvandersloot-mozilla commented 1 year ago

Ditto to Anne's question, with the additional thought experiment: document A embeds document B embeds document B* which fetches a resource B1. Does this fetch include credentials?

johannhof commented 1 year ago

I don't remember tbh and it seems like this is up to our own preference, we should really write tests for these cases though. Even the simple B embeds B1 case isn't 100% defined AFAICT.

From a security point of view it seems safest to say that cross-site ancestor break "implicit" storage access (but same-site iframe chains preserve it?).

cfredric commented 1 year ago

I've been thinking about this and getting ready to write a PR, and I wanted to flesh out the conversation a bit around the edge cases - especially navigations and session history (I haven't thought about workers yet).

The way I see it, there are 2 obvious candidate designs for achieving "per-frame" semantics:

Add some state to the BrowsingContext that corresponds to the frame.
Add some state to the Document that is being shown in the frame.

This state will have to be plumbed around to the same places in order to influence whether cookies are actually accessible during a fetch/JS execution (regardless of where the state is stored), so the choice only gets interesting when we think about the lifetime of that data, how/when it's reset, etc.

If we say the state should be on Document (or somewhere that's 1:1 with Document, whatever), then:

Same-origin navigations won't get implicit storage access while loading.
- I think this is good from a security standpoint, since it means that the entire site doesn't become vulnerable just because a single page opted into using storage access; each document has to individually opt in before credentials are sent.
- However, it's tougher for site developers, since it prevents documents from sending credentials when loading subresources during the initial load -- they'd be forced to call requestStorageAccess first, by design.

If we say that the state should be on the BrowsingContext, then:

Same-origin navigations may have implicit storage access during the initial load, which is nice for site developers, but could be bad from a security standpoint: if an attacker can navigate a victim frame (with storage access) to a sensitive endpoint, that new endpoint will have storage access before it opts in. (I'm know I'm hand-waving here; I'm not great as an exploit developer, but this just feels like it goes against the security goals.)
If the frame refreshes, it'll have implicit storage access during the load. This is nice for devs, since they can call requestStorageAccess and then reload in order to re-issue their subresource requests.
If the top-level document refreshes, however, all the child BrowsingContexts are wiped out, so none of the frames get implicit storage access during the refresh. This feels like a sharp corner for site devs, since it makes the behavior less predictable: if their frame is refreshed at one level, things work, but if any of the parent documents refresh, their frame doesn't work anymore.
IIUC (and I may not, please correct me), the HTML spec reuses BrowsingContexts during cross-site navigations. So, if a frame obtains storage access and then does a cross-site navigation, and a browser bug prevents the has_requested_storage_access state from being reset properly, then the new document will get storage access without having asked for it (which is a security issue). Importantly, the browser has "failed open" due to the bug. (Whereas if state were stored on the Document, then the effect of the equivalent bug would be to deny storage access to a doc that should otherwise get it, i.e. to "fail closed".)
There are also security-sensitive same-origin navigations, e.g. from a document with storage access to some document with Content-Security-Policy: sandbox set. During such a navigation the browser needs to reset the has_requested_storage_access state so that the sandboxed document doesn't inherit it. A bug here would similarly "fail open". (The same argument applies to any new security feature that impacts cookie access: the feature implementations need to explicitly opt out of inheriting the implicit storage access, which feels to me like the wrong approach for security.)

So, all this is to say that I'd rather put the new state on Document, rather than on BrowsingContext. This choice leaves a pain point, namely that there's no way to give a document storage access while it is loading (before script execution). I think we can solve that by using a response header (maybe a new Supports-Loading-Mode variant, or a new boolean-valued header) that indicates "give this doc storage access, if a grant already exists". That still requires per-document opt-in so it fits with the security model, and it enables the "I need to load credentialed subresources" use case in a more consistent (and less annoying) way than the load-then-requestStorageAccess-then-reload workflow.

So, before I take a stab at writing a PR - does using Document (instead of BrowsingContext) sound reasonable to folks?

CC @arturjanc

annevk commented 1 year ago

Requiring requestStorageAccess() for each document indeed has some cost for certain navigation flows. I don't know what the state of current implementations is, but I imagine that might be hard to roll out successfully. I thought there was some past discussion about this, but having looked through past issues I can't seem to find it.

I think I'd like to see a lot more detail about the "response header" idea before it goes to a PR.

bvandersloot-mozilla commented 1 year ago

I think I'd like to see a lot more detail about the "response header" idea before it goes to a PR.

Me as well.

Playing the advocate of putting it on the BrowsingContext:

Same-origin navigations may have implicit storage access during the initial load, which is nice for site developers, but could be bad from a security standpoint: if an attacker can navigate a victim frame (with storage access) to a sensitive endpoint, that new endpoint will have storage access before it opts in. (I'm know I'm hand-waving here; I'm not great as an exploit developer, but this just feels like it goes against the security goals.)

I don't think partitioning should be relied upon as a security boundary in this way. If that were the case, a user-granted permission would be able to break the boundary. The sensitive document/endpoint should use SameSite cookies, frame-ancestors, or some other protection to keep it from being used in an inappropriate context.

If the frame refreshes, it'll have implicit storage access during the load. [...] If the top-level document refreshes, however, all the child BrowsingContexts are wiped out, so none of the frames get implicit storage access during the refresh. This feels like a sharp corner for site devs, since it makes the behavior less predictable: if their frame is refreshed at one level, things work, but if any of the parent documents refresh, their frame doesn't work anymore.

I don't think this is that sharp a corner, personally. The dev still has to deal with two primary cases: loading the document without initial storage access and with initial storage access. An outer document refresh is identical to an initial navigation in this case, which must be supported.

IIUC (and I may not, please correct me), the HTML spec reuses BrowsingContexts during cross-site navigations. [...] Importantly, the browser has "failed open" due to the bug. [...] (The same argument applies to any new security feature that impacts cookie access: the feature implementations need to explicitly opt out of inheriting the implicit storage access, which feels to me like the wrong approach for security.)

I would argue for changing the definition of a navigation to clear this state unless a set of conditions are met, rather than enumerating the cases where it would be cleared. This seems more safe and apt to "fail closed," although I concede that it is some, but less, risk that a new feature is created with a value that is a necessary condition for safety that is not added to this set of conditions.

bvandersloot-mozilla commented 1 year ago

Same-origin navigations may have implicit storage access during the initial load, which is nice for site developers, but could be bad from a security standpoint: if an attacker can navigate a victim frame (with storage access) to a sensitive endpoint, that new endpoint will have storage access before it opts in. (I'm know I'm hand-waving here; I'm not great as an exploit developer, but this just feels like it goes against the security goals.)

I don't think partitioning should be relied upon as a security boundary in this way. If that were the case, a user-granted permission would be able to break the boundary. The sensitive document/endpoint should use SameSite cookies, frame-ancestors, or some other protection to keep it from being used in an inappropriate context.

I recognize there is subtlety here that ties into discussion on #113, particularly here. To articulate more clearly: I don't think partitioning should be used as a document-to-document boundary. We can retain the same-origin policy protections by clearing this bit on cross-origin navigations.

cfredric commented 1 year ago

Requiring requestStorageAccess() for each document indeed has some cost for certain navigation flows. I don't know what the state of current implementations is, but I imagine that might be hard to roll out successfully. I thought there was some past discussion about this, but having looked through past issues I can't seem to find it.

Out of curiosity, do you (or @bvandersloot-mozilla) have any usage metrics on how often those navigation flows are hit and reuse some existing storage access grant?

To articulate more clearly: I don't think partitioning should be used as a document-to-document boundary. We can retain the same-origin policy protections by clearing this bit on cross-origin navigations.

Fair point. I admit that while thinking about this, I was assuming a future in which we also relax the permission key to <top level site, embedded site>, which arguably would be safe as long as each frame is required to actively opt in to using that permission. So in these attacks I was also thinking about implicit storage access on same-site, cross-origin navigations. If we clear the state for all cross-origin navigations (thereby removing implicit storage access for a same-site navigation, but still allowing promptless reuse of an existing grant), that solves the security issue - or at least restores the origin as the security boundary, which is consistent with most of the rest of the web platform. That definitely feels better to me.

If the frame refreshes, it'll have implicit storage access during the load. [...] If the top-level document refreshes, however, all the child BrowsingContexts are wiped out, so none of the frames get implicit storage access during the refresh. This feels like a sharp corner for site devs, since it makes the behavior less predictable: if their frame is refreshed at one level, things work, but if any of the parent documents refresh, their frame doesn't work anymore.

I don't think this is that sharp a corner, personally. The dev still has to deal with two primary cases: loading the document without initial storage access and with initial storage access. An outer document refresh is identical to an initial navigation in this case, which must be supported.

Part of my worry here is about the composability of authenticated embeds: Suppose A embeds B which embeds C, and B and C are both authenticated embeds. A refreshes, which wipes out the implicit storage access state for B and C. After loading, B's script detects that it doesn't have storage access, so it calls requestStorageAccess(), then refreshes the frame. (C's script has meanwhile done the same thing, or has at least started to.) B finishes loading with storage access. C finishes loading, but since B refreshed, it didn't get to load with storage access, as its state was wiped out again. So C's script calls requestStorageAccess() a third time, and reloads the frame. The page is finally ready and working properly, after the 3 documents were loaded up to 6 times.

I admittedly don't have a particular use case in mind that would run into this cascading refresh problem, so maybe it's academic. But regardless, I wanted to avoid imposing a penalty on composition, given that composability is a key feature of the web. (Perhaps this is something that a response header could fix in the future, and doesn't have to be fixed now.)

bvandersloot-mozilla commented 1 year ago

[...] The page is finally ready and working properly, after the 3 documents were loaded up to 6 times.

Ouch, I didn't think of that case. That is a sharp corner. I wouldn't write it off as academic outright. There are better choices the developer could make here with the bit on the BrowsingContext, but I don't know if it would be obvious enough to not do it this way.

cfredric commented 1 year ago

Thanks for the discussion in the editors' meeting yesterday. To reiterate some of that discussion here:

Putting state in BrowsingContext (rather than Document) seems reasonable from a security standpoint as long as the state is keyed by origin, i.e. that same-site (but cross-origin) documents don't use the same state.
A response header (e.g. Supports-Loading-Mode: with-storage-access) that indicates "please load this document's subresources and execute its scripts as if a call to requestStorageAccess had already been executed, if a call to requestStorageAccess would be promptlessly resolved" might solve the composability pain that I described above, by avoiding some need for reloads. We'll see if web developers express a need for that.
A response header (e.g. Requires-Storage-Access?), similar to a Critical Client Hint, that indicates "this endpoint returns a different resource with storage access vs without; please retry (with cookies) if a call to requestStorageAccess would promptlessly resolve" could address a related problem, where the requested document itself (as opposed to its scripts/subresources) requires the credentials.
- This would resolve #130.
- This is also related to #131, but doesn't totally resolve it, since it doesn't allow the server to trigger the prompt without running JS on the client.
- Note that this header, on its own, does not address the composability concern I raised, since the document itself isn't loaded as if it has already called requestStorageAccess(). We could make that behavior part of this header if desired, or keep the headers orthogonal and just expect them to be combined frequently.

One point I wanted to revisit is that of "clearing" state on cross-origin navigations. Anne made the point that the state we're storing could be some type of lookup map (keyed by origin). Given that the state is keyed by origin, I don't think we actually need to clear an origin's state in the BrowsingContext when it does a cross-origin navigation; that way if the user navigates back, or visits a different document on that origin in that BC, they'll still benefit from the implicit storage access. (There's a security subtlety here, where an attacker could navigate a BC to some sensitive origin and load it with implicit storage access iff that BC had already visited that origin and gotten storage access; but this adheres to the SOP, so I think it's ok.) So, this map will grow monotonically over the lifetime of a BC.

Assuming that summary sounds accurate to you all, I'll continue reading and thinking about how to spec this. (I have yet to look into the Fetch spec, and figure out how workers relate to all of this.)

arturjanc commented 1 year ago

I don't have a strong opinion on storing state in the BrowsingContext vs. Document, but I think that, as described, the BrowsingContext approach has some undesirable security properties. Specifically, the embedder's ability to navigate an iframe that was granted storage access to arbitrary same-origin endpoints with credentials re-enables a number of attacks which are prevented by default after the removal of third-party cookies; this includes:

Clickjacking
Attacks on unsafe postMessage implementations; on sites with Cross-Origin-Opener-Policy: same-origin which disallows property access from cross-origin windows, endpoints that don't protect themselves from embedding will be able to receive messages from their embedder and might leak data unless they correctly validate the sender.
Extracting the contents of iframes using hardware-level attacks on browsers/platforms without out-of-process iframes.

Note that these attacks are possible today in browsers with third-party cookies if a site sets its auth cookies as SameSite=None and doesn't protect itself against embedding by using X-Frame-Options/frame-ancestors. But we should think about the long-term stable state of the platform where the attacks mentioned above are prevented by default; we don't want the Storage Access API to be a footgun that re-enables these attacks, especially considering that embedding is allowed by default in the platform and doesn't require explicit opt-in.

There's a security subtlety here, where an attacker could navigate a BC to some sensitive origin and load it with implicit storage access iff that BC had already visited that origin and gotten storage access; but this adheres to the SOP

It seems like a bit of a stretch to use the SOP as an argument for allowing a storage access grant for one document to permit authenticated embedding of an unrelated same-origin document (the SOP generally gives documents with script execution capabilities access to all of the origin's data, but it doesn't make all security mechanisms origin-scoped).

A better analogy might be if the CORS Access-Control-Allow-Credentials: true response header on one endpoint allowed credentialed access to other endpoints in the same origin, which I think is fairly clearly not what we'd want.

Overall, I don't want to push back on the BrowsingContext-based approach because it seems like it has reasonable ergonomic benefits for developers, but I'd like us to try to not make the Storage Access API recreate the problem of isolation bypasses for origins that use it. For example, if we attach state to the BrowsingContext, could we make it so that cross-BC navigations of the BrowsingContext (e.g. the embedder directly navigating the iframe) would clear the storage access permission, or something along these lines?

annevk commented 1 year ago

@arturjanc note that SameSite=None cookies are also vulnerable to A1 -> B -> A2 embedding scenarios. I like the idea of resetting state when the browsing context (or maybe navigable) was tampered with from the outside.

cfredric commented 1 year ago

note that SameSite=None cookies are also vulnerable to A1 -> B -> A2 embedding scenarios.

I think we can simplify the attack model by removing the navigation to/from B, so it's just A1 -> A2. An attacker could accomplish this via clickjacking.

I like the idea of resetting state when the browsing context (or maybe navigable) was tampered with from the outside.

Does the web platform have a way of distinguishing which same-origin navigations should be considered "safe", and which should have their BC state cleared? I.e., what's the definition of "was tampered with from the outside"?

I think there's no (100% reliable) way to distinguish a "legitimate" A1 -> A2 navigation from one that was caused by an attacker, since they both may be caused by user activity (knowingly or not). So we can't say (e.g.) that it's safe to not clear state if the navigation was caused by a user's click, rather than by a script running in some other frame.

If clickjacking is in the threat model, then it seems to me that all cross-document navigations are potentially unsafe (even if they're same-origin), and we would have to clear the state in those cases. So we could only reuse the BC state during refreshes or navigations to the same URL.

annevk commented 1 year ago

@cfredric with embedding I meant A1 frames B which frames A2 (to attack it). The As are same-site and the B is cross-site. (Sorry for assuming familiarity with the abbreviated description.)

Does the web platform have a way of distinguishing which same-origin navigations should be considered "safe", and which should have their BC state cleared?

Yeah kinda, you can tell if your navigable was responsible for the navigation or if another navigable instructed you to navigate. In the latter case you could assume malice for the purposes of this feature. (This isn't just about clickjacking, it's also just about making a privileged navigation, which doesn't necessarily require a click.)

cfredric commented 1 year ago

you can tell if your navigable was responsible for the navigation or if another navigable instructed you to navigate

Sorry, I'm having trouble finding this in the spec. Can you point me to it?

This isn't just about clickjacking, it's also just about making a privileged navigation, which doesn't necessarily require a click.

Right, it's about any navigation IIUC. The reason I'm using clickjacking as an example is that by definition, it means the user has clicked on something that caused the same-origin navigation, rather than the attacker's script having directly initiated the navigation. So that looks the same to the user agent as "legitimate"/trustworthy navigations, IIUC, which makes it difficult to know when it's actually safe to skip clearing the state.

annevk commented 1 year ago

sourceDocument argument of https://html.spec.whatwg.org/#navigate. (This used to be a browsing context but that was factually wrong as browsing contexts (now mostly navigables) have no authority and the argument was used for authority-related decisions. For this decision interestingly enough we do mainly care about the navigables involved I think, but you can get to one from a document so that's all good.)

(And yeah, such a check won't help with XSS, but it will help with the scenarios @arturjanc outlined.)