Reloading frame after granted storage access to enable efficient site isolation

Apple WebKit is interested in exploring if we can allow browser engines to reload a cross-site frame after it is granted storage access through the Storage Access API. That would allow web engines on resource-constrained platforms to only use site isolation for sub frames with access to unpartitioned storage.

There is a difference in data leakage risk between cross-site frames with partitioned storage and cross-site frames with unpartitioned storage. It's most important to isolate documents with personal content, and personal content almost always implies unpartitioned storage. Put differently, being logged in almost always implies access to first-party cookies.

The switch to an isolated process could be done either by a mandatory frame reload on granted storage access (crude) or in a way that's controlled by JavaScript in the frame (perhaps more elegant). For the latter, imagine this for cross-site frame X:

X calls document.hasStorageAccess() and learns that it does not currently have access.
X indicates with a button that the user needs to interact with the content to activate it.
The user taps the button in X.
X calls document.requestStorageAccess().
The user has been prompted previously and opted in, so the browser automatically grants storage access.
X gets the result back that it was granted storage access, but cookie access is not yet open.
X calls document.reloadOnStorageAccess().
The browser checks if X was granted storage access, and since it was, reloads it.

Two key things here:

In step 8, the engine can switch process for X and ensure site isolation before personal content is loaded.
Between step 6 and 7, we could consider allowing X to tell its server about the reload, for instance by supplying a same-site reload URL in the call to document.reloadOnStorageAccess(). This would forward information from X's partitioned state to its unpartitioned state which obviously has privacy implications. But it would allow for the frame reload to be part of a natural content flow.

Hi John, I've got some questions on this proposal.

If I understand correctly, this suggestion is specifically for the scenario where the site-isolation status ought to change after the embed obtains access to its unpartitioned cookies. E.g., top-level foo.com embeds bar.com, and both documents are in the same process; then bar.com requests storage access and reloads, giving the browser the opportunity to load the new document in a separate process. To put it another way, if the site-isolation status does not need to change, then this proposal isn't necessary. (I.e. if foo.com and bar.com are initially in separate processes anyway, then a call to document.requestStorageAccess() doesn't require a reload before accessing sensitive unpartitioned cookies.)

With that in mind:

Can you use heuristics (e.g. "does this site already have unpartitioned cookies?") to preemptively apply site isolation, so that you can avoid a reload? This would lead to a better experience for both developers and users, without compromising on security.
- Chrome (including on Android) does this already (except on devices with <2GB memory).
If you're worried that such a heuristic would have too many false positives (and thus would have too much of a performance hit), we should consider making the COOP header a requirement for resolving calls to document.requestStorageAccess(). If we require COOP, then no reload is necessary.
- We could also consider a stricter requirement, that window.crossOriginIsolated be true in order for a call to document.requestStorageAccess() to resolve. That would also require the COEP header, if I understand correctly.
I don't quite see why document.reloadOnStorageAccess() is needed, or how it differs from what can already be done via e.g. document.hasStorageAccess() and location.reload(). If it is solely to let the browser propagate the "has storage access" state to the new document (after the reload), note that #141 has a provision for this without requiring a new API (see "Changes to navigation" in the spec). If there is another reason or other desired behavior, can you clarify what it is?

When evaluating this proposal, I think we should also keep in mind the priority of constituencies. I'm unconvinced that the performance wins of allowing an in-process cross-site iframe (before the document.requestStorageAccess() call) outweigh the security, ergonomics, and performance benefits of the alternative (i.e. requiring that the embedded iframe starts isolated, at least in the cases where it needs to request storage access).

On further reading, I misunderstood COOP/COEP/crossOriginIsolated. Those are about loading resources in separate BrowsingContextGroups, rather than separate processes. (And iframes are necessarily in the same BCG as their embedder [even though they may be in separate processes], since they must be able to communicate - so COOP is only relevant for top-level documents.) So that alternative isn't viable.

I do still think that preemptively applying site isolation to sites that are likely to request storage access is a good idea, however. If the heuristic approach would lead to too many false positives, we could consider introducing a new response header (without which calls to document.requestStorageAccess would automatically reject) which indicates "this document plans to request storage access". That way, the browser has a clear signal to load the document in a separate process if possible. (The "if possible" caveat is intentional; I don't think we want to break/remove this functionality on resource-constrained devices that cannot do site isolation.)

(That header idea feels somewhat related to my Supports-Loading-Mode: with-storage-access header suggestion in https://github.com/privacycg/storage-access/issues/122#issuecomment-1325381400. If the server knows the script will request storage access, and can tell the browser so upfront, then the browser could load the document with storage access already granted if the requestStorageAccess call would automatically resolve, and otherwise, could load the document without storage access but with the requestStorageAccess method "enabled".)

Hi @johnwilander, sorry for taking a bit to get to this. I think this is an interesting proposal given the security properties we're trying to impose on storage access right now (including per-frame). On the other hand, @cfredric highlights some valid concerns that I'm struggling to balance against the benefits of this proposal.

My main concern is that SAA requirements for developers are difficult enough as they are, as we've taken steps to protect against tracking, prompt abuse, and lately cookie-based attacks. I'd like to avoid a "feature creep" adding more responsibilities to the API (e.g. being a control for site isolation) that will further restrict and complicate the API to the point where we might as well not ship it. So, in discussing this improvement, I want to stay grounded to the usefulness for developers.

With that said, this could fall into the "just beneficial enough but still usable" goldilocks zone, heavily depending on the actual utility this provides for site isolation. I think that will be the main thing to figure out.

Chris and I are not necessarily experts on site isolation but luckily we have a number of experts on Chrome that we consulted with, and that conversation brought up a few more points:

In Chrome we have existing heuristics that enable site isolation. Those are explicitly not meant to trigger or interact with web-exposed behavior as those could be changed by the platform in the future depending on a variety of things such as resource availability.
Specifically, Chrome has platform modes where site isolation is not enabled at all, due to resource or implementation constraints. As such, we could not ship an API that makes any promise about site isolation and would be hesitant to do anything that implicitly projects these guarantees to developers. I expect this to be a challenge for other platforms as well and I think this consideration makes sense for web APIs in general.
At the last call there was some discussion of an A > C; B > C tracking threat (where C is the embedded tracker), i.e. a "Spectre self-attack" from C. While we appreciate the concern in theory, it doesn't seem practical to discuss at this time. There is an entire class of unsolved (and possibly impossible to fully solve) attacks that enable similar capabilities, from fingerprinting to global limits and caches.

I think Chris already relayed these points at the last PCG call, but I wanted to reiterate them here for visibility. I've also copied this to folks internally and encouraged them to chime in.

To move things forward, I also discussed this with @annevk and @bvandersloot-mozilla and (without trying to speak for them) both seemed somewhat relaxed about the idea of requiring a reload for access to cross-site cookies. So, I can see us further discussing the design details for that. I'll start with a few (strong) opinions that touch on the points above:

This API should not mention site isolation in its name or make any indication that it could be used to enable site isolation to developers (the spec could make an informative note that this exists to give UAs the possibility for isolation).
We should move away from "implicit" storage access given to documents based on actions of previous documents towards a world where interest in storage access is explicitly declared, i.e. through the header that @cfredric is proposing. So, I'm not a huge fan of a "reloadWithStorageAccess" function.
Other (future) aspects of the API could be affected by this change. For example, we could be more comfortable with handing out (optional) DOM Storage access as part of SAA when it always happens as part of a fresh document load. This could reduce our efforts on things like Bucket based access (#102)

On a closer read, I'm not particularly fond of the semantic change of document.requestStorageAccess to not actually change the behavior of the current document and instead affect a future document. If it were an alternate function document.requestStorageAccessAndReload that appeared to devs as a convenience function, then I'd be more on board.

As-is this breaks the following use-structure that I see as the easiest way to use the API:

A developer has some function code() that they need storage access for.

The developer calls document.requestStorageAccess()->then(code)
On exception, the developer renders a clickable UI with callback document.requestStorageAccess()->then(code).

Tangentially, this cracks into something I've been thinking of lately: now that the has storage access lives on the environment, is document the right object for these functions to live on? If the answer is "no, window makes more sense" then maybe requiring a document to reload makes more sense, i.e. by defining a document "has storage access" as the conjunction of the environment's "has storage access" value and a header on the document's HTTP response.

As is, I agree with Johann's first two strong opinions though: "this API should not mention site isolation in its name" and "we should move away from "implicit" storage access given to documents based on actions of previous documents".

I discussed this some more with colleagues and we're amenable to a more backwards-compatible change along the lines @bvandersloot-mozilla suggests. Either a new method or perhaps a dictionary with a navigate member accepting a same-origin destination. The latter might already be a flow that many websites have whereby they show a partitioned view and once they get cookie access they navigate to the non-partitioned view.

The existing code requestStorageAccess() path would then end up failing in the scenario where a) the embedded site is already loaded into the non-isolated process and b) only then does the user visit the embedded site in a top-level navigable.

Does that seem like something that could work?

We discussed this at the editor's call and I had some thoughts/concerns which I've since confirmed with some other folks on the Chrome side:

It's not clear if Chrome (or Firefox) would make use of this feature but we're generally open to discussing how it can work for WebKit without too much developer friction.
It seems like WebKit will already isolate processes for sites that had prior top-level user interaction. There's likely a big overlap between sites that fall under this heuristic and sites that call the SAA. Thus, we seem to be really only talking about edge cases even for WebKit here.
If we add an "optional" navigate parameter without which your code fails, say, 10% of the time, then it's really not optional because who would want that to happen, as a developer?
Considering the above points, it seems wasteful to effectively force a navigate parameter and the associated navigation on developers, which will result in worse performance and overall UX for end users in cases where developers wouldn't need to navigate/reload. It definitely seems like an inversion of the Priority of Constituencies.

It would be nice if instead we gave the developer the ability to deal with these rarer failures when they happen, by initiating a navigation on their own. My preferred method of doing this would be exposing the failure reason through a special exception that gets thrown after the user consented to the SAA prompt, e.g.

document.requestStorageAccess()
  .then(() => loadCredentials())
  .catch((e) => {
    if (e.name == "SecurityError" && e.message.contains("navigate")) {
       location.reload();
    } 
  });

I agree with Ben's above point that this probably shouldn't magically affect the next document load (something that @cfredric and I had also been discussing) and instead we should probably require developers to go through the regular rSA flow again after they handled the error with a navigation (just this time being autogranted because they have the permission). This would also make things more compatible with the storage handle approach suggested in #102.

Discussing this with colleagues we could live with the specific exception option. I suggest we reuse "NoModificationAllowedError" for this purpose. (Branching on message is an extreme anti-pattern we should not encourage anywhere, for what it's worth.) When that exception is thrown, your request in principle succeeded, but you'll have to create a new document and call requestStorageAccess() again in that new document in order to be able to use it. And of course that would have to happen within some reasonable timeframe, so the end user can actually be assumed to remember having made that particular decision.

Thanks Anne, that seems reasonable to me!

privacycg / storage-access

Reloading frame after granted storage access to enable efficient site isolation #154