whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.13k stars 2.67k forks source link

Sandboxed iframe, blocked download, and load event #7029

Open GPHemsley opened 3 years ago

GPHemsley commented 3 years ago

A sandboxed iframe without an allow-downloads flag will block a download.

However, it seems there is no consensus around what happens to the iframe after the download is blocked.

In Firefox, the iframe will navigate to about:blank and fire a load event, allowing detection that this has occurred (because about:blank was not the desired location).

In Chrome, however, it seems the blocked download will also prevent the load event from occurring, making it apparently impossible to detect that the download has been blocked. (It's not clear to me whether or not the iframe is navigated to about:blank during this.)

I would think that Firefox's behavior is the desired one, but the spec has so much going on during this sequence of events that I can't really tell.

annevk commented 3 years ago

Can you describe the complete setup? Is it essentially:

const frame = document.createElement("iframe");
frame.src = downloadURL;
document.body.appendChild(frame);

If so, I would indeed expect an about:blank document. A navigation that results in a download is essentially a no-op as far as the browsing context is concerned.

GPHemsley commented 3 years ago

Well you have to set the sandbox attribute, but yes:

async_test((t) => {
    const downloadURL = "resources/mime-type-sniffing.py?test_set=scriptable-mime-types&test_id=2195";

    const iframe = document.createElement("iframe");

    iframe.sandbox = "allow-same-origin";

    iframe.addEventListener("load", t.step_func(() => {
        t.done();
    }));

    iframe.src = downloadURL;

    document.body.appendChild(iframe);
});

This passes immediately in Firefox but times out in Chrome.

GPHemsley commented 3 years ago

Hmm... apparently it has nothing to do with the sandbox at all. The problem still manifests even when the download is presented to the user.

annevk commented 3 years ago

It is somewhat weird that Chrome doesn't fire a load event at all. @domenic might know why that is as he looked at this recently. Firefox does what I expect.

domenic commented 3 years ago

Per spec, downloads (just like 204s) will bail out of the navigate algorithm. This results in no navigation (not to about:blank) or load event, for successful downloads. It just spins up a separate algorithm to do the actual downloading.

For downloads blocked by sandboxing, the same thing occurs, just the algorithm bails out before doing the downloading.

So it looks like Chrome and Safari's behavior is the one that is correct per spec. And this also seems like it's probably by design; letting the sandboxed page know that you're preventing it from doing the bad thing seems unwise.

GPHemsley commented 3 years ago

Per spec, downloads (just like 204s) will bail out of the navigate algorithm. This results in no navigation (not to about:blank) or load event, for successful downloads. It just spins up a separate algorithm to do the actual downloading.

For downloads blocked by sandboxing, the same thing occurs, just the algorithm bails out before doing the downloading.

Can you provide links to the spec, so we're all on the same page? There are a lot of interacting sections involved here.

So it looks like Chrome and Safari's behavior is the one that is correct per spec. And this also seems like it's probably by design; letting the sandboxed page know that you're preventing it from doing the bad thing seems unwise.

The place where this is biting me is when I'm the one creating the iframe and polling to determine if a download occurred. (The only reason I'm using the sandbox to prevent the download is because I'm writing tests and I don't want a ton of download prompts to spawn.) I don't care whether or not the nested page knows what the parent page is doing; I just need the parent page to know what the nested page is doing.

annevk commented 3 years ago

@domenic I thought there was supposed to be a load event for an initial about:blank document?

domenic commented 3 years ago

Can you provide links to the spec, so we're all on the same page? There are a lot of interacting sections involved here.

process a navigate response step 7 checks allowedToDownload, which is computed according to the allowed to download algorithm back in step 16 of navigate.

I thought there was supposed to be a load event for an initial about:blank document?

On the iframe itself? Only if we are headed directly to the initial about:blank document, via e.g. <iframe src=""> or <iframe> or <iframe src="about:blank">. That's shared attribute processing steps step 4.

If we're headed somewhere else, but end up staying on the initial about:blank due to a 204 or download, then we go down the normal navigate algorithm path, which bails out with no event.

I think the idea is to avoid firing two load events for "normal" cases like <iframe src="https://example.com/"> (one for the initial about:blank, and one for example.com). While still firing one for cases like <iframe src="about:blank"> that never hit the navigate algorithm at all. That does mean that the middle case, where you do <iframe src="https://a-download.com/"> or <iframe src="https://a-204.com/">, fires no load event. We could patch that up for consistency, I guess?

On the Window, I don't think we fire an event for the initial about:blank at all. Create a new browsing context doesn't, and the usual place, at the end of parsing, does not run for the initial about:blank document.

annevk commented 3 years ago

If the rule is such that there's no load event for the initial document, except when you "explicitly" designate it to be about:blank, the behavior of the specification (and Chrome) makes sense to me.

It's not clear how we could fire a load event for the 204-or-download case other than keeping track of the initial document bit, which does not seem great. Also, it would end up revealing a time that is not otherwise exposed to content (the time you find out it's 204-or-download). (I was somewhat surprised to find out we expose 204-or-download at all, but there is no way around that.)

So I suppose the tests in question have to rely on a timeout, but perhaps there's a testing API that can be added to make this better.