whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.18k stars 2.71k forks source link

A new mode for iframes where they don't add to the joint session history? #6501

Open domenic opened 3 years ago

domenic commented 3 years ago

This has come up several times in discussions with @annevk, @csreis, @natechapin, @smaug----, @jakearchibald, and others.

Basically, iframes participating in the joint session history can be problematic for web developers and for users. It means iframes can change the meaning of history.back() and the back button. Although this is sometimes a good thing, in terms of allowing the user to transition between multiple states, in other cases it's not desired, such as when iframing ads or other untrusted third-party code.

Even the app history proposal cannot tame this fully, because it wants to stay based on top of joint session history so as to avoid creating a whole parallel incompatible model. It tries to make things better, by saying that appHistory.back() is guaranteed to move your own frame back (and not just some subframe or parent frame), but that leads to a problem (https://github.com/WICG/app-history/issues/73) where appHistory.back(); appHistory.forward() can take you to a different joint session history entry than the one you started, in a way that is visible in terms of what subframes are showing.

So I'm wondering if we can approach this from a different angle, of allowing web developers to say that a given frame cannot impact joint session history. In particular, I'd propose that this makes all navigations within the frame "replace" navigations. (This is what is currently planned for prerendering and portals, although those are early-stage.) We could also contemplate removing the history.back/forward/go() APIs in such frames since they would only be able to mess with the parent, and maybe we don't want to allow such parent-messing.

If people think this is a good idea, the biggest question is whether the embedded content would need to opt-in (e.g., in the style that document policy requires) or whether we could allow the embedder page to impose this on the embeddee without consent (like sandbox="" does).

jakearchibald commented 3 years ago

I figured we might need to solve this problem so I've been designing for it in https://github.com/whatwg/html/pull/6315, but it's a little different to what you're proposing here.

I have:

Currently, browsers only treat the top-level as traversable, but the model will allow for traversables to be nested.

However, in this model, nested traversables would have their own session history of multiple entries, rather than have everything turned into "replace" navigations. These history entries wouldn't be traversable via the back/forward buttons, but could be traversed using history.back() and the new appHistory APIs.

Since the top-level doesn't store the session history of nested traversables, their session history wouldn't be repopulated after going back to the page (unless of course the previous page was kept alive in bfcache).

The benefit of this is you wouldn't need to remove history.back() from these contexts, since it'd only impact the session history of the nearest traversable, not the top level. Also, we wouldn't need to make it opt-in, since the behaviour would be normal, just sandboxed.

The downside is it might be more complicated to implement, but I'm not sure.

annevk commented 3 years ago

I think what @jakearchibald outlines matches what we wanted for #763 (see also the very long discussion at https://github.com/WICG/webcomponents/issues/184). A good initial test for this feature might be deploying it for shadow iframes.

cc @rniwa

domenic commented 3 years ago

What is the appropriate model for how to impose this? I can think of a few options:

Of these permissions policy seems pretty good...

annevk commented 3 years ago

Yeah, although I'm not a big fan of opt-in for third parties, I agree it fits here. (And perhaps we can switch the default over time.)

cc @clelland

domenic commented 3 years ago

Some implementation discussions among the Chrome team revealed that implementing the variant of this that @jakearchibald proposes, with independent session histories for the iframes, would be quite difficult. Also, it could be confusing for users in cases involving iframe restoration.

The proposal we're favoring now is to have this make every navigation in the iframe a "replace" navigation. This still meets the use cases I'm most concerned with, which is letting pages ensure that iframes don't add entries to the joint session history.

Does that still sound reasonable to Mozilla folks? /cc @smaug----

annevk commented 3 years ago

What happens with a nested history.back()? In scenarios where this is forced upon nested documents that still functioning could end up breaking things in unexpected ways.

I'm also still wondering if we could make this the default for shadow tree nested documents to solve the leakage problem there.

domenic commented 3 years ago

Yeah, it could indeed have that impact. I think that's OK?

I remain doubtful that it would be web-compatible to change how shadow trees work, but we'd welcome anyone running an experiment to prove us wrong.

annevk commented 3 years ago

In the case where this is a policy you force upon descendants you probably would not want any descendant to be able to navigate away from you again. It seems quite easy for a descendant to have some pushState() + back() logic that would break the top once run.

domenic commented 3 years ago

Oh, I see, because back() would still operate on the joint session history. Yeah, I agree that would be bad.

Were you thinking we would just make back() etc. a no-op in these frames?

annevk commented 3 years ago

I was wondering what you all were thinking since @jakearchibald hinted at this being a problem in https://github.com/whatwg/html/issues/6501#issuecomment-800940078, but no-op would work I think.

jakearchibald commented 3 years ago

@domenic

Also, it could be confusing for users in cases involving iframe restoration.

In what way?

domenic commented 3 years ago

I think I was making a few logic jumps there that are not, in retrospect, justified. In particular I was thinking "having to save/restore/sync for these independent session history lists would be a lot of work to spec/implement, so, we should probably just not do so and restore from the src="" attribute. Which would lead to a confusing user experience." But yes, if we went to the trouble of implementing and speccing not only independent session history lists, but also saving/restoring them, then I think there are no user experience problems.

So this is mostly a question as to whether the web developer experience gains of allowing non-replace navigations/nontrivial session history in these scenarios is worth the implementation cost. I think the Chrome history engineers' position is that it is not worth it, and I tend to agree.

domfarolino commented 3 years ago

So this is mostly a question as to whether the web developer experience gains of allowing non-replace navigations/nontrivial session history in these scenarios is worth the implementation cost. I think the Chrome history engineers' position is that it is not worth it, and I tend to agree.

Expanding on the Chrome implementation position, I think what's really hard to implement is actually full separation of top-level and iframe session history in general. So even if we had replacement-only session history model in iframes, I think what would be really challenging is (a) the back button restoration stuff mentioned earlier, (b) Divorcing the history entirely (i.e., not sharing history.length, having history.back() in an iframe not influence the top-level page). I think that's the tricky part. That is, the implementation complexity is agnostic to the triviality of the session history model in the iframe, but is related to the completeness of the separation in general. I'd like @csreis to correct me though if I am wrong.

domenic commented 3 years ago

So even if we had replacement-only session history model in iframes, I think what would be really challenging is (a) the back button restoration stuff mentioned earlier, (b) Divorcing the history entirely (i.e., not sharing history.length, having history.back() in an iframe not influence the top-level page).

Hmm, my impression was different. (a) restoration gets pretty easy if we only have to serialize "current URL" for the iframe, and not an entire history list? Maybe? (b) history.length never changes due to replacement, only due to push. And we can just add a check (e.g. in the renderer process) to bail out for history.back(), etc. in these iframes.

domfarolino commented 3 years ago

history.length never changes due to replacement, only due to push

But it is still shared. If A.com embeds B.com and C.com where both B and C are sibling iframes, and C.com navigates a million times, I think all of these are observable by A.com and B.com via history.length.

Maybe this is too implementation-specific or "deep" but the question seems to be which one do we want:

The distinction between the two might be overly-pedantic from a spec perspective; if that's the case then sorry.

Hmm, my impression was different. (a) restoration gets pretty easy if we only have to serialize "current URL" for the iframe, and not an entire history list? Maybe?

That's quite possible! I'm not totally sure so I delegate to Charlie. From an impl POV it does seem like if we don't actually require fully separated iframe history but still have it operate on the same underlying objects (just in a replacement-only manner and with manually patched i.e., history.length), then yeah we'd get today's iframe restoration logic for free I think. Anyways if this comment is getting us off-track, my apologies.

domenic commented 3 years ago

But it is still shared. If A.com embeds B.com and C.com where both B and C are sibling iframes, and C.com navigates a million times, I think all of these are observable by A.com and B.com via history.length.

What I was getting at was if the million C.com navigations are with replacement, then history.length will not change.

But I agree it provides a communications channel. E.g. if B is in replace-only mode and C is not, then B and C can communicate by B observing history.length and C modifying it.

We could manually patch this out by making history.length return 1 in such replace-only iframes. Probably that's a good idea anyway since we're saying history.back() will no-op; it seems like those two should stay in sync, generally.

If we do such patching, then I hope the two are equivalent. But I agree they're conceptually different, and there might be things we miss patching which could make them diverge :-/.

domfarolino commented 3 years ago

Yeah, that is the only thing I am worried about, basically a slightly incomplete audit and therefore incomplete manual patches.

Side-question:

Yeah, although I'm not a big fan of opt-in for third parties, I agree it fits here. (And perhaps we can switch the default over time.)

Do we think that we could ever actually make this the default behavior without requiring all content to opt-in to being framed? Right now content can opt-out of being framed and we're discussing the possibility of making all framed content have this replacement-only model in the future. This is a big pretty big behavior difference between a given site operating in a top-level context and the same site operating in a framed context, so I feel like this would just require turning the opt-in for this new replacement-only model into an opt-in for "You're being placed in an iframe".

csreis commented 3 years ago

Hmm, my impression was different. (a) restoration gets pretty easy if we only have to serialize "current URL" for the iframe, and not an entire history list? Maybe?

That's quite possible! I'm not totally sure so I delegate to Charlie. From an impl POV it does seem like if we don't actually require fully separated iframe history but still have it operate on the same underlying objects (just in a replacement-only manner and with manually patched i.e., history.length), then yeah we'd get today's iframe restoration logic for free I think. Anyways if this comment is getting us off-track, my apologies.

To clarify, it's impractical in Chrome to restore a nested joint session history (e.g., if "nested traversables would have their own session history of multiple entries"), even if that nested history is limited to a single item. In the single item case, we would still need to store more than just the URL: also scroll position, form state, and many other things that get serialized.

However, I agree with @domfarolino that the replace-only + manual-patch suggestion seems to avoid the need for a nested joint session history. If an iframe in this mode is limited to a single session history item and can't otherwise affect the top-level joint session history, then it seems reasonable to store it as a single session history item within the top-level joint session history. That does seem like it would make restoration possible.

rniwa commented 3 years ago

Surely this should be a property of a frame, and not of a specific document / navigation? At least that's what we were discussing for the shadow DOM.

domenic commented 3 years ago

Yep, this would be a property of the iframe (browsing context); that's how permissions policy works.

clelland commented 2 years ago

[Catching up] Using permissions policy for this seems reasonable, with a default allowlist of *, as long as we're not considering the initial option of simply removing back() and friends from the history object. I suspect that doing that would require explicit opt-in from the frame.

Breaking back() gets close to that point, but it's arguably more acceptable to make it a no-op than to cause it to throw.

brandonmcconnell commented 1 year ago

Having opened a related issue here: #8773, I wanted to contribute some thoughts here re iframe's maintaining their own history.

I side most closely with @jakearchibald's proposed spec.

Firstly, I think it's essential that iframes have their own maintained history, and as far as I understand, using replaceState for every navigation within iframes would make it impossible for iframes to traverse their own history. Isolating the history of iframes so they don't pollute the history of the parent window would be a great and logical next step in the evolution of iframes. However, if in doing so, iframes lose the ability to control their own bwd/fwd navigation, I think we create an even larger issue and one that might be harder to come back from.

I think its imperative that iframes can manage their own history, and that the history for an iframe can be traversed fwd/bwd from the parent frame, at least so that the parent frame can present backward/forward/refresh navigation options.

I don't think we need to worry about restoring nested joint session histories, only the top-level. As for nested iframes-in-iframes, I think it's perfectly suitable for a nested iframe-in-iframe to also self maintain its own history, and how that works would be dictated by the setting of those nested iframes allow attribute values.

For example:

Top-level window (A) ◄┄┄┄┄┐ ◄┄┄╳┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┐
 │                        ┆                             ┆
 ├── iframe:not([allow]) (B1) ◄┐                        ┆
 │    │                        ┆                        ┆
 │    └── iframe:not([allow]) (C) ◄┄┄╳┄┄┄┄┄┄┄┄┄┄┄┐      ┆
 │         │                                     ┆      ┆
 │         └── iframe[allow="isolated-history"] (D) ◄┄┐ ┆
 │              │                        ┌┄┄┄┄┄┄┄┄┄┄┄┄┘ ┆
 │              └── iframe:not([allow]) (E)             ┆
 │                                                      ┆
 └── iframe[allow="isolated-history"] (B2) ┄┄┄┄┄┄┄┄┄┄┄┄┄┘

In the above example, without using allow="isolated-history", an iframe's history entries will be added to its parent's frames, and its parent's parent, and so on until a frame is reached which isolates the history.

With this in mind…

Note, rather than allow'ing history to bubble up, we allow the history to be isolated so as not to cause any breaking changes on the web, in the heart of "don't break the web".

isolate-history seems clear enough to me, though something else like nested-history or suppress-history could work as well.