whatwg / storage

Storage Standard
https://storage.spec.whatwg.org/
Other
126 stars 55 forks source link

Session storage and changing browsing contexts #119

Open jakearchibald opened 3 years ago

jakearchibald commented 3 years ago

I'm trying to figure out how session storage should behave in a few similar cases.

I think the possible options for each of these are:

a) The new page has fresh, empty session storage b) The new page has the same session storage c) The new page has a clone of session storage

…where b is generally what happens while navigating around in the same tab, and c happens when opening a popup.

References to "isolated" here mean Cross-Origin-Opener-Policy: same-origin.

1. User navigates from non-isolated pageA to isolated pageB in the same tab. My gut feeling is that "b" is ok here. Only visible data would be able to pass from pageA to pageB. This may be the first time a single unit of session storage has been shared between a (process, origin) pair, although it's pretty common with localstorage.

2. pageA performs a <link rel="prerender"> of pageB. User navigates from pageA to pageB in the same tab and the prerender is used. A prerender will exist in a new browsing context, so this is similar to the previous case, and imo have the same result, "b". The difference is, in the same-origin case, it's more likely that each process will be accessing session storage at the same time.

3. User navigates from pageA to pageB by changing the URL in the URL bar. This case feels different to me. Since the navigation hasn't been caused by web content within the session, "a" feels more appropriate. However, if the user clicks back, the previous session storage should be present.

4. Non-isolated pageA opens a popup to isolated pageB. Again, it seems ok to continue with the current behaviour, which is "c". That seems to be what Chrome does now.

@annevk @rakina does the above make sense?

If so, I think I'll change what I'm currently calling "browsing session" to "top level navigable" (making a special kind of navigable), since it's supposed to represent a window/tab. Then I'll give browsing contexts a session bucket/ID. Multiple browsing contexts can share the same session bucket/ID, and a top level navigable may contain history entries that have browsing sessions with different session bucket/IDs.

This would also allow us to add logic that prevents history.back() traversing to a history entry with a different session than the current entry, although the browser back button would not have this restriction.

annevk commented 3 years ago

I tend to agree with your assessment, but I would love to hear from @asutherland and maybe @janvarga about sharing sessionStorage across the process boundary. If sessionStorage has a different storage architecture from localStorage I wonder if it's worth salvaging or whether sessions should be reintroduced as part of buckets (and not have the problematic aspects there, such as cloning).

jakearchibald commented 3 years ago

I think there'd be a lot of value in something like session-idb, and yeah, it wouldn't clone when opening auxiliary browsing instances (although the "duplicate tab" feature might).

mkruisselbrink commented 3 years ago

The chrome implementation of session storage today does rely in a number of ways on there always being at most one renderer process accessing a particular session storage. It's been a while since I looked at/touched that code, but I think this made it much easier to ensure we always dispatch the right events even in the presence of copy-on-write behavior. So while supporting sharing one session storage across multiple renderer processes is probably possible, it would certainly be a non-trivial effort.

mkruisselbrink commented 3 years ago

Having said that, it doesn't sound like case 1) would actually involve two processes accessing the same storage at the same time, so it would still be okay. 2) might be more problematic if that does actually let two processes access the same storage at the same time. My understanding was that at some point portals was considering doing the same, but ended up deciding to block storage before activation instead (partially to solve this problem).

asutherland commented 3 years ago

Firefox's SessionStorage implementation currently assumes only a single content process (akin to renderer) will actively be using the storage for a given (Top level browsing context (session) id, origin) tuple. It's expected and handled that as history is traversed that this will result in this notional active process changing. In the event the concurrent access assumption is violated, each content process will effectively fork the state on first SessionStorage access and live in their own universes, never to perceive the canonical state that a new content process would see.

If needed, this could instead be made to converge with the current LocalStorage NextGen implementation wherein multiple processes can access the same data with a consistent snapshot established for the duration of a task the first time LS is accessed within an task, with atomic write-back of any deltas occurring at the end of the task. In the event of a race, mutation batches will stack and all processes will perceive the canonical state during the next task. Sync-ish IPC is sadly involved.

In general it would be nice if the single content/renderer process invariant can be maintained. Which maybe is to say that a SessionStorage storage bottle can only be accessed from one agent cluster at a time? In cases where that would be violated, we must clone/copy/stick-in-the-replicator the bottle or create a new empty bottle. It sounds like that's what's being proposed here?

annevk commented 3 years ago

I think we're actually fine and I slightly misunderstood the implications of something Jake said. When you navigate from origin A without COOP to origin A with COOP there is a process swap, but it's not the case that both would have access to the same sessionStorage simultaneously.

jakearchibald commented 3 years ago

Yeah it's the prerender case where access can happen simultaneously.

jakearchibald commented 3 years ago

Would a "clone & swap" system like this be easier than truly shared storage @mkruisselbrink @asutherland?

  1. A top-level page contain a same-origin prerender. The same-origin prerendered page receives a clone of session storage.
  2. Wait until the prerendered page activates.
  3. Switch the session storage of the now-activated prerendered page from its clone to the session storage used by the old top-level page.
  4. Fire "storage" events for every difference between the old cloned storage and the new storage.

This means that clicking back would take you to the previous page without changing session storage.

asutherland commented 3 years ago

Are the modifications performed by the pre-rendering page to sessionStorage discarded by the bottle transplant? Or is step 4 more than bringing the pre-render up-to-speed with the changes in the old top-level-page and implies also applying the changes made by the pre-render?

jakearchibald commented 3 years ago

Ignore the specifics of step 4 for now, I'm getting ahead of myself.

Are the modifications performed by the pre-rendering page to sessionStorage discarded by the bottle transplant?

Yes, discarded, although some event may provide insight into which keys changed as a result of the transplant, and provide their old value.

So:

  1. Main page has session storage { foo: 'bar', hello: 'world' }.
  2. Prerender created, which has a clone of session storage, so { foo: 'bar', hello: 'world' }.
  3. Main page sets 'username' to 'sarah' in session storage, so { foo: 'bar', hello: 'world', username: 'sarah' }.
  4. Prerender sets 'foo' to 'yo', and '123' to '456', so { foo: 'yo', hello: 'world', '123': '456' }.

If prerender becomes the main page, it's session storage will be replaced with the main page's storage, and (maybe) it gets an event that tells it:

I don't know if this is a good idea yet, but I'm trying to get a sense for how complex it is vs making session storage 'work' across processes.

jakearchibald commented 3 years ago

I asked folks on Twitter how they were using session storage, here's an analysis of how prerendering could disrupt that, and how I landed on the above idea https://docs.google.com/document/d/1cCTBZR6nWsVC2TlQ8PBse7eBb4ro0rtPJxX0zCou1lw.

asutherland commented 3 years ago

Thank you for the clarification and excellent example and summary! Also, thank you very much for creating and linking the amazing document in which you've gathered use-cases and analyzed the impacts of the different solutions on the use-cases!

I think you make a great case for "clone & swap". In particular, I think it:

jakearchibald commented 3 years ago

Cheers! There might still be a race between what the script in the main page sets in session storage before it's cloned for the prerender (edit: although that will be resolved at swap time). The "empty" solution has fewer races, but means the prerender is more likely to be 'correct', and fetch meaningful data.

jakearchibald commented 3 years ago

I've been thinking about the event a bit more, and I don't think we need it. In fact, if "clone & swap" ends up being a stopgap before a shared model, we don't really want a bespoke event that might be useless in future.

There are a few ways for a page to become active 'later':

And in all cases, you need to look at session storage if you have state there. Firefox and Safari do not queue storage events for bfcached pages (Chrome's experimental implementation does, but is what what we want @altimin?).

Maybe the best rule for developers is to check session storage for changes after 'pageshow'? For prerender, we can make sure the storage is swapped before that event.

The only sharp edge is that the 'pageshow' event fires after 'load' for some reason. We'd need to make sure there's an easy way to ignore the 'pageshow' event for a page that wasn't 'hidden'.

asutherland commented 3 years ago

Firefox's storage implementation originally intended to queue storage events but the naive implementation logic for that never actually worked, just temporarily consumed memory as it buffered the events without coalescing, so we removed it in a cleanup.

Defining "pageshow" to mean "welcome to the future! everything has changed! reconsider everything!" is very appealing as an implementer, but if you're going to expose SessionStorage to a pre-rendered page at all, maybe it's better to instead generate synthetic StorageEvent events with the constraint that oldValue will not be present. So for pages that are frozen/etc., they'd just accumulate a list of dirty keys and when they become unfrozen, we fire a bunch of synthetic storage events at them. This seems friendlier to content and with reasonable memory costs since we're just storing a set of keys.

jakearchibald commented 3 years ago
  1. Session storage is in old { foo: 'bar', hello: 'world' } state.
  2. Swap happens (or page is shown after being in bfcache), so it's now { foo: 'yo', hello: 'world', '123': '456' }.
  3. "storage" event for 'foo' key.
  4. "storage" event for '123' key.

I'm worried that, in step 3, the view of storage is the final state, rather than the state between updating key 'foo' and '123'.

Or does that already happen?

asutherland commented 3 years ago

I believe that already happens.

The current spec mutates the single map and calls broadcast which queues a task. This works in a straightforward fashion under a single-process model assuming all windows run on the same event loop/main thread; the current state of the map always reflects the JS code that has executed and the events are after-the-fact delta notifications.

We don't specify what happens under a multi-process model; I think this would be the first time we're addressing it. Practically speaking, this can be very implementation dependent. On Firefox Release, Legacy LocalStorage processes storage broadcasts as they are received via IPC and updates the underlying (local process) map just before dispatching the events so content is perceiving a replay of what happened in the other process like the events had been dispatched synchronously instead of asynchronously via a task. On Firefox Nightly, LocalStorage NextGen approximates the single-process situation wherein a (coherent snapshot of) the single backing map will be seen that is entirely up-to-date when the event is dispatched / perceived.

mkruisselbrink commented 3 years ago

I could have sworn I left a comment earlier, but seems I didn't. This does seem like a pretty reasonable proposal, yes. I do share @jakearchibald's concerns about storage events happening when they don't match the view of storage though. Currently at least in Chrome we try pretty hard to guarantee that the view of storage always matches what events are dispatched.

I wonder if "everything has changed" wouldn't be simpler from a website point of view either... (Not sure what we do today with Local/SessionStorage and bfcache in chrome, I think we just keep queueing up events, perhaps eventually evicting the page from bfcache if there are too many changes? Not sure).

hiroshige-g commented 3 years ago

Draft WPT: https://github.com/web-platform-tests/wpt/pull/31080. This is unstable (timeout) on wpt.fyi, but according to Chromium CI and locally installed Firefox, this confirms the behavior at https://github.com/whatwg/storage/issues/119#issuecomment-785771108 (Chrome fires storage events for modifications during a page is BFCached while Firefox/Safari don't).

hiroshige-g commented 2 years ago

Hi, I'm planning to merge https://github.com/web-platform-tests/wpt/pull/31080 and wondering what is the expected behavior around BFCache: should storage events for storage modification while a page is in BFCache be fired after the page is restored (Chrome's behavior), or not (Safari/Firefox's behavior)? FYI @rakina @mkruisselbrink

hiroshige-g commented 2 years ago

Friendly ping; any opinion about whether we should queue storage events during BFCache or not?

annevk commented 2 years ago

Above @asutherland argued for dispatching a sequence of events for "dirty keys" (with oldValue being null). I think I personally would somewhat prefer not firing events at all and use this renewed interest in bfcache to teach web developers that this can have changed. It seems it has the potential to get rather noisy otherwise.

asutherland commented 2 years ago

I defer to @annevk in all things; my key concern in general would be that anything in BFCache should not accumulate an unbounded list of changes that could result in unbounded memory usage and @annevk's proposal does cover that.

rakina commented 2 years ago

As an outsider, @annevk's proposal in https://github.com/whatwg/storage/issues/119#issuecomment-1115844532 makes sense to me to balance between supporting current usage of the API and not being too much of a burden on the implementation side. Sites can always proactively check for things on "pageshow" too. @mkruisselbrink @jakearchibald please shout if there are problems with this approach!

hiroshige-g commented 2 years ago

OK, the concensus here is https://github.com/whatwg/storage/issues/119#issuecomment-1115844532. I updated the test to expect no events are fired for storage updates while the page is in BFCache, and filed a bug against Chromium (https://crbug.com/1328939).