privacycg / storage-access

The Storage Access API
https://privacycg.github.io/storage-access/
199 stars 26 forks source link

Improving the Storage Access API security model #113

Closed arturjanc closed 1 year ago

arturjanc commented 1 year ago

As part of a review of the Storage Access API design, @johannhof and I have identified concerns of cross-site information leakage that reduce the security benefit of third-party cookie restrictions. We'd like to suggest improvements to the API’s security model that will mitigate and ideally resolve these concerns.

A brief summary of the issues:

  1. After an iframe receives storage access, unrelated cross-origin iframes embedded by the top-level site will be able to start sending credentialed requests to the iframe's origin.
    • In a browser that has restricted access to cross-site cookies, this both reveals the fact that storage access was granted to other iframes (this can be done because it's generally possible to infer whether a cross-site request was sent with credentials by loading resources which require authentication) and exposes the origin which received storage access to traditional cross-site attacks, such as CSRF, clickjacking, XS-Search and other XS-leaks.
      • There are no restrictions on sending credentialed cross-site requests to the entity which received storage access from sandbox frames. Untrusted content embedded by the top-level site in a sandboxed iframe also receives the ability to make credentialed requests to the grantee.
  2. The top-level embedder gains the ability to make credentialed requests to arbitrary endpoints in the origin which received storage access. This opens up origins which use the Storage Access API to attacks from their embedders.
  3. The scope of a Storage Access grant is overly broad: after storage access is granted, all requests to the embeddee's origin or site (depending on the browser) will start carrying cookies. This includes requests to endpoints which are unrelated to functionality dependent on authentication in third-party contexts. In some browsers, the access grant also provides access to cookies by default, i.e. attaches all cookies that weren't explicitly set as SameSite=Lax or SameSite=Strict to cross-site requests.

We put together a document that describes this in more detail: https://docs.google.com/document/d/1AsrETl-7XvnZNbG81Zy9BcZfKbqACQYBSrjM3VsIpjY/edit#

The document includes three proposals to improve various parts of the security model. In a nutshell, we'd like to:

  1. Prevent iframes from being able to use storage access obtained by a cross-origin sibling frame.
  2. Require CORS for any credentialed requests from the embedder to the embeddee.
  3. Reduce the scope of credentialed requests sent after storage access is granted.
    • This restricts both the scope of requests which will carry credentials (restricting it to only the embeddee's own origin, and not its entire site), and the cookies attached to cross-site requests, where we'd provide only cookies explicitly set as SameSite=None.

Our expectation is that (1) and (3) above should have relatively little impact on existing uses of requestStorageAccess, while the CORS requirement may require developers making requests to the embeddee from the context of their 1P embedder to refactor their code and enable CORS for affected resources.

For context, our review was done in the context of the recent requestStorageAccessForSite proposal and the mitigations that we are proposing will address some important concerns listed in its Privacy and Security Considerations. However, we think that these issues should be addressed regardless of consensus on requestStorageAccessForSite.

We'd like to reach consensus about these changes across browser vendors soon, to prevent having to make backwards-incompatible changes after the API gains more adoption.

/cc @annevk @johnwilander

iambmelt commented 1 year ago

From an embedded application & SSO perspective, the functionality described here sounds undesirable. In Microsoft Teams and Office, 1P/3P developers may embed applications running across different origins that we'd like to allow cookies access to (for SSO with our IdP/STS), based on a single SAA prompt, rather than multiple prompts or prompts on a per-frame basis. Multiple prompts would make for a highly disrupted user experience in such cases

/cc @jasonnutter and @timcappalli who I believe shared similar feedback offline

arturjanc commented 1 year ago

Can you clarify which of the specific restrictions above (from the second list) will have the effect of causing multiple prompts in your setup? I assume you mean (3) where storage access would be scoped to the requesting origin instead of the site?

johnwilander commented 1 year ago

I don’t think multiple prompts for the same pair of top and embedded site is a thing in any of the implementations. It certainly isn’t in WebKit. Once the user has opted in for a pair, that is remembered and the will be no prompt.

arturjanc commented 1 year ago

That's true, but the suggestion in (3) is to scope the grant to <embedding site, grantee origin> as opposed to <embedding site, grantee site> -- which I believe matches the current scope in Firefox, but not WebKit. On pages that embed multiple frames from the same site, but different origins (e.g. site.example embedding widget1.site2.example and widget2.site2.example) this would result in multiple prompts.

johnwilander commented 1 year ago

I doubt we’d prompt per subdomain if we’ve granted for one of them under the same registrable domain. We would probably just grant on interaction + API call.

arturjanc commented 1 year ago

The concern with this approach is that it reduces security by exposing unrelated same-site-but-cross-origin applications to cross-site attacks after storage access is granted to any document within their site. The consequence of this behavior is that third-party cookie deprecation would lose much of its security value.

As an example, there were places where browsers made the assumption that the removal of 3p cookies will eliminate certain cross-origin information leaks -- unfortunately, this assumption will no longer hold when storage access granted to a sibling subdomain re-enables such leaks against the entire site (which can include more sensitive origins than the one which requested storage access).

arturjanc commented 1 year ago

Just to illustrate this with a quick example:

  1. I run bank.example which includes login.bank.example with sensitive data as well as docs.bank.example which hosts an embeddable widget which can ask for storage access to remember the location of a user's local branch.
  2. onlinestore.example iframes docs.bank.example as part of checkout flow (where the iframe may ask for storage access). It also has a clickthemonkey.onlinestore.example marketing site which embeds third-party iframes with the best monkey images on the web, based on URLs sent by users.

In the current model, clickthemonkey.onlinestore.example as well as all of the iframes it embeds will receive the ability to send credentialed requests to login.bank.example and will be to learn when the user is logged into the bank, as well as exploit XS-leaks, CSRF and related vulnerabilities in that origin.

jasonnutter commented 1 year ago

Can you clarify which of the specific restrictions above (from the second list) will have the effect of causing multiple prompts in your setup? I assume you mean (3) where storage access would be scoped to the requesting origin instead of the site?

(1), where cross-origin subframes can't take advantage of the storage access given to embedded origin from the top frame.

iambmelt commented 1 year ago

Can you clarify which of the specific restrictions above (from the second list) will have the effect of causing multiple prompts in your setup? I assume you mean (3) where storage access would be scoped to the requesting origin instead of the site?

(1), where cross-origin subframes can't take advantage of the storage access given to embedded origin from the top frame.

@arturjanc ^^ this behavior, thanks @jasonnutter

jasonnutter commented 1 year ago

In the current model, clickthemonkey.onlinestore.example as well as all of the iframes it embeds will receive the ability to send credentialed requests to login.bank.example and will be to learn when the user is logged into the bank, as well as exploit XS-leaks, CSRF and related vulnerabilities in that origin.

Can you please clarify how you envision clickthemonkey.onlinestore.example learning the user is logged into login.bank.example in this example? I can think of either via postMessage, redirect, or fetch/xhr, and the first two would require explicit action by login.bank.example (i.e. it would the responsibility of login.bank.example to verify the embedding domain [via preregistration and/or user consent] before sending it any sensitive information).

jasonnutter commented 1 year ago

In our world as an IDP, while any site can embed our authorize endpoint (with prompt=none), even if the user is logged in (and 1p cookies for that session are available in the iframe), authentication artifacts will only be accessible to the relying party when the iframe has been redirected back to the provided redirect_uri, which needs to be the same origin as the embedding site and if that redirect_uri is registered ahead of time, and the user has provided consent for that relying party (or an admin has consented for them).

jasonnutter commented 1 year ago

In the POC, it looks like the attacking site is embedding a javascript file for the given embedded domain that has been granted storage access, and then console.logging the incoming cookies (seemingly without any verification of the parent domain), correct?

My perspective is that if you are a website that can be embedded, it is your responsibility to verify who is embedding you before transmitting any sensitive information, either via only allowing certain ancestors via CSP (which isn't always practical) or by explicitly verifying the embedding origin (e.g. inspecting referrer headers and/or only returning artifacts to trusted origins via redirects).

arturjanc commented 1 year ago

(1), where cross-origin subframes can't take advantage of the storage access given to embedded origin from the top frame.

Could you provide a more specific example of the origins/sites that are interacting in this way in your use case?

The reason I'm asking is that allowing this results is arguably the most problematic behavior from a security perspective. Understanding why a frame that's cross-site to both the top-level site and the origin which received storage access needs to send credentialed requests to the grantee is important to figure out the best solution here.

Note that the summary in the text above is a bit of a mental shortcut, for which I apologize. The actual proposal in the doc linked above talks about using Permissions Policy with a default of 'self'where storage access could be delegated to cross-origin frames by the top-level site if needed.

arturjanc commented 1 year ago

Can you please clarify how you envision clickthemonkey.onlinestore.example learning the user is logged into login.bank.example in this example?

clickthemonkey.onlinestore.example can observe when requests carry cookies based on behavior of endpoints in the target origin that are visible cross-origin. This is briefly covered under the note about preconditions in the Summary of problems with the requestStorageAccess design and current implementations section of the doc above, but it's useful to elaborate.

There are multiple ways in which applications reveal whether a request is made with cookies:

  1. Any resource that returns an error when the user is not logged in, but loads corectly when the user is logged in.
  2. Any resource that results in a redirect when the user is not logged in. This is common if a request to /profile redirects to a login page for uncredentialed requests.
  3. Any resource which is allowed by the application to be shared with other users. An attacker can share a resource only with a victim user (or group of users) and observe whether it loads using the same technique as in (1).
  4. Any resource that with a timing difference depending on whether a user was logged in. For example, if /api/exportdata results in a {} response to an uncredentialed request, but returns a lot of data for an authenticated request.

These are just the basic, fairly general patterns which reveal if a user was logged in. There are others, e.g. load timings of any responses that return a Vary: Cookie header will also reveal when a request starts carrying cookies (due to the necessity to load the resource from the network instead of the cache).

In the POC, it looks like the attacking site is embedding a javascript file for the given embedded domain that has been granted storage access, and then console.logging the incoming cookies (seemingly without any verification of the parent domain), correct?

Yes, this just a debugging feature in the PoC to show which cookies were present, and is not what a real application would do. The actual patterns that reveal whether a request was made with cookies are the ones listed above.

arturjanc commented 1 year ago

My perspective is that if you are a website that can be embedded, it is your responsibility to verify who is embedding you before transmitting any sensitive information, either via only allowing certain ancestors via CSP (which isn't always practical).

This is unfortunately too optimistic of an assumption in this case, which we can't depend on.

First, it's important to remember that here we're talking about login.bank.example which is not meant to be embeddable, it just happens to be same-site with docs.bank.example which received storage access. Second, login.bank.example is safe by default against cross-site information leaks in browsers which disable third-party cookies or which default to SameSite=Lax cookies. Only when an unrelated sibling subdomain (docs.bank.example) receives storage access, does login.bank.example get exposed to attacks because unrelated third-parties can now start sending arbitrary credentialed requests to it in the current model. This is the current problematic behavior of the Storage Access API which this proposal aims to address.

jasonnutter commented 1 year ago

Could you provide a more specific example of the origins/sites that are interacting in this way in your use case?

The example given by @iambmelt is the key one we're concerned about: a top-level application (e.g. Teams) which needs to embed our IDP for its own authentication, and then also embeds cross-origin iframes (i.e. Teams addins) which we also want to be able to get SSO with the same user identity that is signed into the top-level app (without user interaction, whenever possible).

bvandersloot-mozilla commented 1 year ago

I agree with the analysis by @arturjanc in the linked doc. Linking to the old issue where per-frame vs per-page: #3.

bvandersloot-mozilla commented 1 year ago

Could you provide a more specific example of the origins/sites that are interacting in this way in your use case?

The example given by @iambmelt is the key one we're concerned about: a top-level application (e.g. Teams) which needs to embed our IDP for its own authentication, and then also embeds cross-origin iframes (i.e. Teams addins) which we also want to be able to get SSO with the same user identity that is signed into the top-level app (without user interaction, whenever possible).

One alternative here is to move the user activation check to immediately before the "grant" logic (determine the storage access policy). This would allow activation-less checks of the existing map, but not activation-less permission promts or browser-specific autogrant checks.

annevk commented 1 year ago

I left some inline comments directly on the document. Overall I tend to agree with the observations it makes, though I'm a little wary that requiring CORS will end up moving the needle much.

To the point that @johnwilander raised, it does not seem inconceivable to me to perhaps not prompt again for a given site, but do require each origin to independently call rSA. So while the end user prompt's scope is site, the scope of impact is the origin that made the call. I think that would get you most of the security benefits. In the specification we'd only have to define the latter as the end user prompt's scope is very much up to the user agent. (This might be better discussed in #39 which already goes into some of the issues we might run into if we go down that path.)

johannhof commented 1 year ago

I've written up some details about "do not prompt again for a given site, but do require each origin to independently call rSA" in #122, essentially it means walking back from the per-page model but I think it's worth it for several reasons listed in that issue.

johannhof commented 1 year ago

I'll go ahead and claim that we resolved the overall concern here through #141, thanks for the discussion everyone and especially @arturjanc for discovering this issue!

arturjanc commented 1 year ago

Thanks for all the work on resolving this and switching to a per-frame model in https://github.com/privacycg/storage-access/issues/122, @johannhof @cfredric.

IMO the change strikes the right balance and is enough to address the concerns outlined above (and avoids much of the complexity we'd have to introduce to achieve the same security properties in a per-page model).