whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.11k stars 2.67k forks source link

Navigation sourceDocument for browser-initiated navigations #9133

Open domenic opened 1 year ago

domenic commented 1 year ago

Currently the navigate algorithm assumes it is always passed a sourceDocument. This is used for:

However, for browser UI-initiated navigations, we don't set any sourceDocument. So, that's bad; many algorithms are actually broken.

You might think we could treat browser UI navigations as if the source document is navigating itself. This works in some cases, e.g., it bypasses the allowed-by-sandboxing check. But it doesn't work for others; e.g., if the user navigates to a Content-Disposition: attachment URL, it should work, even if the page currently being displayed has sandboxing flags that disallow downloads.

So I think we need to allow sourceDocument to be null for browser UI navigations, and figure out replacements for all of the things derived from it. Ideas:

OK, so the hard ones are initiator origin and fetch client. Ideas:

I'm willing to put up a PR for this, but I'd love some confirmation that I'm on the right track here before I do. Any thoughts from @annevk @smaug---- @domfarolino @jakearchibald ? Especially @annevk for the Fetch integration questions

domenic commented 1 year ago
  • initiator origin: I think we want a new opaque origin for top-level about:blanks or javascript:'a string'.

Here is a fun fact! If you navigate to javascript:'a string' (via a bookmark), the resulting page has self.origin equal to the origin of the page you just left! Whereas, self.origin === "null" for about:blank. So, I guess we need to special-case the javascript: case to actually pull the initiator origin from the currently-active document!

domfarolino commented 1 year ago

Yeah I like the idea of making the initiator origin opaque in this case in general, but I'm curious what other browsers are doing here, and if it's observable in any other ways we're missing. But...

We don't want to tell the user "an opaque origin is attempting to navigate you to some-external-app://foo" if the user themselves typed the URL in the URL bar, so we might need a bit of special casing here.

...I can glean a little bit about what browsers are doing by looking at the external protocol case specifically. For example, when I navigate with browser UI to facetime:<number> in Chromium, I get "A website wants to open this application", whereas if I click a facetime: link on a website, I get a prompt that's specifically tailored to the initiator origin. At least Chromium is checking if the initiator origin is opaque and throwing up a more generic prompt in that case, which I think is exactly in the purview of a UA, so I personally think we won't need to special-case anything in the spec here. UAs already have to deal with opaque origins in this case if external protocols are accessed from other opaque origin contexts.

I'm not seeing any other interesting observable effects between using opaque origins vs origins derived in HTML, but I'd definitely like to know more about the Fetch implications. At least I don't think we have to worry about any "tainting", since it seems like the mode="navigate" case is handled specially for that, and I don't think we ever send the Origin header on normal GETs, so it seems fine to use an opaque origin there. Is it possible to trigger a POST (or any other request where we'd have to send the Origin header) with browser UI, as if there were no initiator document? My guess is no... from messing around on https://iframe-session-history.glitch.me/ (with POST requests specifically) I can't get the POST resubmission prompt to throw by clicking in the URL bar and hitting enter.


Fetch client being null and manually filling in the other bits sounds good to me. At least it won't mess with the fetch task queueing stuff since we're already using useParallelQueue = true, but I too would like to hear from @annevk.


So, I guess we need to special-case the javascript: case to actually pull the initiator origin from the currently-active document!

Great find. It's a little surprising that we literally just run the JS in the currently active document of the target navigable, but I guess procedurally that does make sense, and I suppose is fine from a security perspective since the user is making all of these actions.

annevk commented 1 year ago

I think I mentioned this at some point and you assured me there was always a document. 😊

Manually filling in bits makes sense to me, but Fetch might also need some changes as it doesn't really do well when there is no client. I'm also not sure about the origin field.

noamr commented 1 week ago

I believe that this should be an initial about:blank document, similar to what's created in https://html.spec.whatwg.org/multipage/document-sequences.html#creating-a-new-browsing-context.

Looking at that function, it would populate the correct values here, except for "has transient activation". I think that when we create that initial about:blank, we should also set its last activation timestamp which would correct this.

annevk commented 1 week ago

But what happens if you navigate an existing tab then (by clicking a bookmark or some such)? It would first navigate to an about:blank document and then continue? And we'd elide the about:blank document from history? Hmm.

noamr commented 1 week ago

But what happens if you navigate an existing tab then (by clicking a bookmark or some such)? It would first navigate to an about:blank document and then continue? And we'd elide the about:blank document from history? Hmm.

"slide from history" => we would replace that about:blank with the actual document, like we do for the first navigation in the tab. But I'm not sure it's necessary to actually navigate to it, it's just a document for the purpose of sourceDocument and getting the source snapshot params.

noamr commented 1 week ago

origin: a new opaque origin?? Will this break Fetch and cause various security checks to fail?? Maybe we should pick the target URL's origin instead?

[edited] I looked through various specs, like service workers and fetch metadata. I couldn't find one place where the origin of a browser-initiated navigation request is observable. So perhaps it doesn't really matter what we put there, and an opaque origin makes the most sense?