whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.05k stars 2.64k forks source link

Document sharing across session history entries following redirects & traversal #6680

Open jakearchibald opened 3 years ago

jakearchibald commented 3 years ago

Some changes I want to make to the session history traversal spec (as part of https://github.com/whatwg/html/pull/6315) that may need their own discussion:

In the current spec, each history entry has its own 'document' property. Some history entries will use the same document (pushState, hash change). Updating these is done per history entry, eg "for each history entry that has a document equal to previousDocument…".

I don't think this works, so I'm replacing it with a document state, where multiple history entries can point to the same document state. This contains a document or null, but also data needed to reconstruct that document later.

This means that a document can be replaced with a new document (reload), or become null (discard following navigation/traversal) and later 'revived' (traversal back), and those multiple history entries will automatically point to the new document.

This means if you navigate:

  1. /a#foo
  2. /b
  3. /a#bar

…then, assuming previous entries' documents have been discarded, and you go(-2), you will get a different document. This matches Chrome/Firefox behaviour. The spec suggests that the same document should be used, because the URL only differs by hash. Safari kinda follows the spec here, but it's really buggy. The spec behaviour also gets really confusing when you factor in bfcache, because the bfcache behaviour will result in a document swap, whereas a no-bfcache traversal will stick with the current document.

Although multiple history entries can point to the same document state, there are cases where a history entry should change its own document state, meaning it no longer shares state with history entries it was previously sharing state with.

If an attempt to repopulate a history entry's document state's document (either due to reload, or traversal) results in a response from a different URL to the current document (due to redirects), then its entire document state will be swapped for a new one. Other history entries may continue to use and share the old state.

The current spec doesn't perform this swap, which is bad especially if the redirect was cross-origin. Thankfully browsers appear to behave as above.

It's less clear what to do if the response ends up with the same URL, but via redirects. Firefox swaps if there's any redirect, whereas Chrome only cares about the eventual URL. I'm a little worried about that in the case of cross-origin redirects, but I don't have an attack in mind.

Tests All tests use `no-store` to prevent bfcache. 1. Nav to https://redirect-session-history.glitch.me/a - entry1 1. Nav to `/a#foo` - entry2 (now we have two entries with same doc) 1. Nav to `/a#bar` - entry3 (now we have three entries with same doc) 1. Nav to `/b` - entry4 1. Go back, but it redirects to `/c`. 1. Go back. 1. Go back. Spec, Chrome, Firefox, Safari: First back goes to `/c#bar`. Entries 1 and 2 no longer share a doc with 3, but entries 1 & 2 still share a doc. 1. Nav to https://redirect-session-history.glitch.me/a - entry1 1. Nav to `/a#foo` - entry2 (now we have two entries with same doc) 1. Nav to `/a#bar` - entry3 (now we have three entries with same doc) 1. Reload, but redirects to `/b` 1. Go back. Spec: Entry 3 becomes `/b#bar`. Entries 1 and 2 are updated to use the same doc as the new entry 3. This seems very wrong, especially if the redirect was cross-origin. Chrome, Firefox, Safari: Reload makes entry 3 `/b#bar`. Entries 1 and 2 no longer share a doc with 3, but entries 1 & 2 still share a doc. 1. Nav to https://redirect-session-history.glitch.me/a - entry1 1. Nav to `/a#foo` - entry2 (now we have two entries with same doc) 1. Nav to `/a#bar` - entry3 (now we have three entries with same doc) 1. Reload, but redirects to `/a#hello` 1. Go back. Spec, Chrome: Entry 3 becomes `/a#hello`. Entries 1 & 2 are updated to use the same doc as the new entry 3. Safari: As spec/Chrome. However, going back to entry 1 doesn't remove the hash. Then going forward seems to remove entry3?? Must be a bug. Firefox: Entry 3 becomes `/a#hello`. Entries 1 and 2 no longer share a doc with 3, but entries 1 & 2 still share a doc. Results are the same even if the redirect goes via `/b` before redirecting back to `/a#hello`. Results are the same even if the redirect goes via another origin before redirecting back to `/a#hello`. 1. Nav to https://redirect-session-history.glitch.me/a - entry1 1. Nav to `/a#foo` - entry2 (now we have two entries with same doc) 1. Nav to `/b`, but it redirects to `/a#hello` - entry3 1. Go back. Spec: When navigating back, if only the hash differs, continue to use the same doc. (If the doc is still there in bfcache, it'll use the bfcached doc). Chrome, Firefox: Entry 3 has a different doc to entries 1 and 2 (which continue to share a doc). Safari: When navigating back, if only the hash differs, continue to use the same doc. But then, session history gets weirdly broken as seen in previous tests. 1. Nav to https://redirect-session-history.glitch.me/a - entry1 1. Nav to `/a#foo` - entry2 shares doc 1. Nav to `/b` - entry3 1. Nav to `/c` - entry4 1. Go back, but it redirects to `/a#hello` 1. Go back (since it's just a hash traversal) Spec: Going back from entry 3 to 2 should use the same document because it's just a hash change. Chrome, Firefox: Entry 3 uses a different document to entries 1 & 2. Entries 1 & 2 continue to use the same document. Safari: Going back from entry 3 to 2 uses the same document, but session history gets buggy as in previous tests. 1. Nav to https://redirect-session-history.glitch.me/a - entry1 1. Nav to `/a#bar` - entry2 same doc 1. Nav to `/b` - entry3 1. Nav to `/a` - entry4 1. Nav to `/a#foo` - entry5 1. go(-3) Spec: It's just a hash navigation, so there's no doc change. Chrome, Firefox: Doc change. Safari: No doc change, but session history gets buggy as in previous tests.
domenic commented 3 years ago

…then, assuming previous entries' documents have been discarded, and you go(-2), you will get a different document. This matches Chrome/Firefox behaviour.

Updating the spec to match Chrome/Firefox here makes sense to me.

The current spec doesn't perform this swap, which is bad especially if the redirect was cross-origin. Thankfully browsers appear to behave as above.

Phew!

It's less clear what to do if the response ends up with the same URL, but via redirects. Firefox swaps if there's any redirect, whereas Chrome only cares about the eventual URL. I'm a little worried about that in the case of cross-origin redirects, but I don't have an attack in mind.

I lean toward it being simpler to just reset on any redirects. Maybe @csreis or @rakina could let us know if this sounds like a feasible thing for Chrome to implement, in the fullness of time.

rakina commented 3 years ago

Charlie can correct me here but I think we are already resetting some states for cross-SiteInstance redirects in Chrome. So probably changing a navigation from same-document to cross-document is doable after a cross-origin navigation is doable as well (currently cross/same-document classification is determined and fixed at the start of navigation)

jakearchibald commented 3 years ago

Whatever we decide here will influence what happens to history.state https://github.com/whatwg/html/issues/6213#issuecomment-870412074.