whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.11k stars 2.67k forks source link

Consider blocking downloads from a sandboxed `<iframe>`. #3236

Closed mikewest closed 4 years ago

mikewest commented 6 years ago

Blast from the past: https://lists.w3.org/Archives/Public/public-whatwg-archive/2013Feb/0010.html that came to mind because someone filed another feature request for it against Chrome (duped against crbug #539938).

It's currently possible to force a download by serving a file with a "Content-Disposition: attachment; filename=..." header. Notably, this mechanism can be used to download a file with minimal user interaction by including the resource to be downloaded in an IFrame. This holds even for sandboxed IFrames, as demonstrated by http://lcamtuf.coredump.cx/sandboxed.html (clicking that link will download a file, fair warning).

It seems consistent with the general thought behind the sandbox attribute that it should control downloads as well as the bits it already locks down. I'd propose adjusting the spec to include a sandboxed downloads flag, which, when present, would block all downloads from inside the frame (or, perhaps only require user confirmation?). This restriction could be lifted via an 'allow-downloads' keyword, if present in the sandbox attribute's token list.

Would folks in @whatwg/security be on board with this kind of change?

mikewest commented 6 years ago

(And also folks who should probably be in that group, like @dveditz, @johnwilander, @patrickkettner, @bzbarsky, @ckerschb, @arturjanc, @dbates-wk)

annevk commented 6 years ago

(I've invited those in your list that were not already invited to join the team. For anyone curious, the way we use teams is documented here: https://github.com/whatwg/meta/blob/master/GITHUB-TEAMS.md.)

bzbarsky commented 6 years ago

Seems plausible to me, especially if we can get some telemetry on how often this is used in practice right now. I expect not too much.

devd commented 6 years ago

I am worried about breaking existing clients during the transition. But maybe the telemetry can say how often its being used. I also think we could mitigate this risk by saying "allow downloads only if allow-top-navigation or allow-popups is set". If someone's setting those two flags, not allowing downloads seems weird. I dislike how the list of keywords for sandbox keeps growing.

lubin2010 commented 6 years ago

Yes, I agree that we need to collect some telemetry first.

"allow downloads only if allow-top-navigation or allow-popups is set"

@devd One concern is that malicious ads (in sandbox usually with 'allow-popups' to allow new pages to be opened) are automatically downloading say, APKs on user device, and it would not be able to block such behaviors.

devd commented 6 years ago

I forget the spec ... does the popup inherit the sandbox? If not, can't the popup allow the download?

bzbarsky commented 6 years ago

I forget the spec ... does the popup inherit the sandbox?

Yes, unless you set allow-popups-to-escape-sandbox.

mikewest commented 6 years ago

I also think we could mitigate this risk by saying "allow downloads only if allow-top-navigation or allow-popups is set".

  1. I'd prefer this be controlled by a clear and distinct mechanism. allow-downloads is simpler to explain and to use.
  2. It's not clear to me why "can download" should imply "can navigate". I just want to construct a blob from some data I have lying around and hand it to the user for safekeeping, it seems better to do so without also giving that context the ability to navigate (e.g. just sandbox="allow-download")
lubin2010 commented 6 years ago

I forget the spec ... does the popup inherit the sandbox? If not, can't the popup allow the download?

As @bzbarsky said, the popup inherits the sandbox unless you set allow-popups-to-escape-sandbox. But auto popup would be blocked by popup blockers that most browsers have, while auto download won't. So popups are guarded by user gestures (interactions), but not downloads.

xyaoinum commented 5 years ago

Hi folks, I’m looking to spec out the details and add that to Chrome. Here’s my proposal:

We are targeting to prevent only downloads that doesn’t involve a user gesture, which include synthetic <a download> link clicks, as well as navigations to non-web-renderable content without user gesture. And the click or navigation occurs in a sandboxed browsing context with download flag. “allow-downloads” can be used to turn off this restriction.

This implies: Alt-Click, Drag-And-Drop, Context Menu triggered download won’t be blocked. Top-level-navigation to a download won’t be blocked.

The reason to allow top level navigation to a download is: 1) it’s a clear mechanism to spec out, as this is implied by the fact that top navigation doesn't inherit the sandbox; on the other hand, popup to a download will be blocked as it inherit the sandbox. 2) sandbox already disallow top-level-navigation, and if the client opt in, it makes sense to trust it regardless whether it results in a download or not.

I'm happy to see what people are thinking.

johnwilander commented 5 years ago

We use “-by-user-activation” when that’s a requirement for the capability. Should we say “allow-download-without-user-activation” here?

xyaoinum commented 5 years ago

We use “-by-user-activation” when that’s a requirement for the capability. Should we say “allow-download-without-user-activation” here?

sounds reasonable to me

domenic commented 5 years ago

Hmm, aren't all sanbbox allow-X flags implicitly "without user activation"? Is there a reason to be more explicit with this one?

xyaoinum commented 5 years ago

Hmm, aren't all sandbox allow-X flags implicitly "without user activation"? Is there a reason to be more explicit with this one?

I looked at the spec, the only flags involving the notion of user activation is:

If we single out the 1st one, the "allow" keyword is implicitly "without user activation", but looking them together it's not necessarily so. But it does suggest another option which is to use 2 flags and keywords in the same way as "top-level navigation" does.

Here are some options and my thoughts: 1) "downloads without user activation flag", suppressed by "allow-downloads" keyword. -- Simple. Conform to existing practice if we can assume "allow" is implicitly "without user activation", which is questionable. 2) "downloads without user activation flag", suppressed by "allow-downloads-without-user-activation" keyword. -- More clear on what it controls. But by just looking at the keyword without seeing the restriction, it tends to imply it still won't allow download with user activation, although restriction is only on "without activation". 3) "downloads without user activation flag", suppressed by "allow-downloads" keyword. "downloads with user activation flag", suppressed by either "allow-downloads" or allow-downloads-by-user-activation"" keyword. -- Most flexible. Conform to existing practice. At the cost of growing flags / keywords by 2, and more breakage to the web compared with 1) and 2).

I personally is in favor of 1) for simplicity, but all options make sense to me. Happy to know what other people think.

domenic commented 5 years ago

I meant all the others. I.e. we say allow-scripts, not allow-scripts-without-user-activation. Etc. We don't involve the concept of user activation (even in the flag name) except in the one case of top navigation, where we have both and need to contrast the two.

xyaoinum commented 5 years ago

I meant all the others. I.e. we say allow-scripts, not allow-scripts-without-user-activation. Etc. We don't involve the concept of user activation (even in the flag name) except in the one case of top navigation, where we have both and need to contrast the two.

My proposal of the restriction was targeting at just to prevent download without user activation (the definition of that is either the navigation-to-download is without user activation, or <a download> click is synthetic), and this restriction is different from all the others. Of course this is also debatable - maybe we can prevent download irrespective of user activation. In that case, "allow-downloads" makes total sense.

domenic commented 5 years ago

Oh, I see. Sorry, I completely missed that key point from your original post... somehow. In that case I think including -without-user-activation makes perfect sense, and I apologize for wasting time on the back and forth.

sirdarckcat commented 5 years ago

Drive-by comment: would the sandboxed window have a way to know that the download failed?

xyaoinum commented 5 years ago

Drive-by comment: would the sandboxed window have a way to know that the download failed?

I'm thinking giving devtool messages, similar to what Chrome currently does when blocking the top-navigation, popups, etc.

xyaoinum commented 5 years ago

cc @ehsan : I'm interested to see if FireFox is interested in implementing, so that the spec change review can be moved forward. Unconfirmed bug on Firefox.

annevk commented 5 years ago

@xyaoinum if we think it's reasonable to disable this retroactively, why do we need allow-download-without-user-activation? Are there web developers asking for that capability?

xyaoinum commented 5 years ago

@xyaoinum if we think it's reasonable to disable this retroactively, why do we need allow-download-without-user-activation? Are there web developers asking for that capability?

The opt-in feature was originally proposed for compat concern (i.e. was not explicitly asked by any web developer). But from Chrome's statistics collected later, we do have seen both legitimate & abusive cases:

Legitimate: -- Site sandbox embedding a 3rd party site that converts (or artificially delays) & downloads files. Even though it's triggered by a click, the converting process takes longer than user gesture expiration time (currently 5s in Chrome). -- Site triggering download by embedding it in a dynamically created sandboxed iframe which is 1st party. The reason to do this is perhaps to limit its privilege.

Abusive: -- Some URL shortening service: before user go to the destination URL, the transition page is sandbox embedding an advertiser site, and some of those do automatic download. It's hard to deterministically see a download as the advert selection is sort of random.

The "drive-by download in sandbox" behavior is seen on 0.002% page loads, and it's estimated that about 70% of those are abusive cases. Therefore, I think the best thing to do is to intervene by default, and give options to lift the restriction.

dveditz commented 5 years ago

The "drive-by download in sandbox" behavior is seen on 0.002% page loads, and it's estimated that about 70% of those are abusive cases.

@xyaoinum 0.002% is an impressively small number, but "page loads" isn't the most relevant measure here. Downloading files isn't all that common compared to typical surfing around. Why just weed out the sandboxed drive-bys? Is it that they are such an insignificant portion of downloads that we can safely kill it? If that's the reason are we really helping enough to be worth adding new sandboxing attribute? Are there a lot of legit non-drive-by sandboxed downloads? Could we just kill all sandboxed downloads (unless there's an attribute) rather than making this distinction about drive-bys?

Can you share numbers on: % pageloads with any kind of download % pageloads with drive-by downloads % pageloads with sandboxed downloads (any type) estimated % abusive for any of the above

xyaoinum commented 5 years ago

@dveditz Here are the % of page loads and % of abusive: In sandbox & no gesture: 0.002%; 70% abusive In sandbox: 0.003%; 50% abusive (Note: After a brief scan, I essentially assume those downloads initiated with gesture are all legitimate, because there was a click after all.) No gesture: 0.2%; <1% abusive (Note: The reason it's so low has to do with how user gesture is determined. There are a lot of cases where download is triggered via a real click, and then a new iframe is created for doing the download. In this case, the gesture is not propagated to the new frame per Chrome's user activation mechanism.) Download: 0.7%; <<1% abusive

I believe adding the download restriction (vs doing nothing) is still wanted as it mitigates potential security risks. I think the trade-off between blocking "sandboxed (any type)" and "sandboxed drive-by" is: blocking "sandboxed (any type)" would introduce more breakage without blocking many more abusive cases if any. The good thing is that it's a shorter/easier-to-interpret attribute, where the meaning of it won't be affected by how browser determine user activation that's probably going to change over time.

There's a web compatibility guideline for Chrome, which says 0.001% page loads is small but non-trivial. I'm personally more inclined towards doing "sandboxed drive-by" for compatibility reason, but both are plausible to me.

What do you think?

xyaoinum commented 5 years ago

cc @annevk

xyaoinum commented 5 years ago

After some discussion, I changed my inclination and I'm proposing to make the restriction stronger from "disallowing downloads without gesture" to just "disallow downloads", and the allow attributes would then becomes "allow-downloads".

It's a simpler policy compared to the original one which makes this distinction about drive-bys, and it would still have small impact, although the latest data in Chrome M77 Stable indicates the impact is 0.0044% (download without gesture) vs 0.0059% (no distinction), a bit higher than previously stated.

@dveditz , @annevk : happy to see Mozilla's position. The pull request has also been updated.

xyaoinum commented 4 years ago

ping @annevk and @dveditz again.

annevk commented 4 years ago

@xyaoinum I think the thing I'm missing is justification for the new value to opt out of the restriction. @dveditz asked about this above and I don't think there was an answer unless I overlooked it.

xyaoinum commented 4 years ago

@annevk : There are both legit cases and abusive cases right now - roughly half and half. To restrict by default would help with abusive cases like automatic download from advertisement. And I can imagine some embedders want to just allow download while restricting other behaviors. e.g. embedded video converter sites .

annevk commented 4 years ago

I guess I'd like to see something stronger. How many embedders would adopt this new value?

xyaoinum commented 4 years ago

@annevk : We reached to a few sites (15+) on the top of the list and only developers from AOL and Google responded. And they both have adopted / are ready to adopt the change.

I think many embedders haven't adopted the new value only because they're not aware of the change, as I see no reason they wouldn't adopt if download is expected to be a legitimate use case on the embedded site. And the cost of adopting is also trivial.

I feel standardizing this is a step forward to make more developers aware of it.

xyaoinum commented 4 years ago

@annevk / @dveditz : Want to confirm if this aligns with your thoughts: Compared to allow-downloads, were you saying you would be in favor of allow-downloads-with-gesture or "no allow- token at all"? And what kind of justification do you need to answer "How many embedders would adopt this new value"?

I think we could do the latter options and let those intervened download to be resumable through a browser UI as a last resort to the breakages (repeated below for reference).

~0.002% page loads are having those legitimate download without gesture in sandbox:

xyaoinum commented 4 years ago

friendly ping @annevk and @dveditz . Please let me know if you prefer other options over the proposed option - to block all downloads in sandbox and to use "allow-downloads" to opt-out.

Or, as Domenic pointed out in the latest spec review, if you don't have strong objection but are just uncertain about how embedders would react, maybe it's reasonable for Chrome trying to ship the restriction, and if it's web-compatible we can then proceed with the spec change?

jkarlin commented 4 years ago

FWIW, I'm in favor of starting with allow-downloads which makes the frame like any other frame today, as that covers the use cases that we've come across. I'm happy to add allow-downloads-with-gesture down the road if start wanting it.

jashkenas commented 4 years ago

Hi folks 👋,

(With all due apologies for posting on a closed ticket), over at Observable, we run a platform that allows our notebook authors to write arbitrary JavaScript within sandboxed iframes, and folks that use Chrome Canary are beginning to be affected by this change. I imagine that the JSBins and Codepens and their friends and family will begin running into this soon as well.

In general, we want to give our users as much of the power of the web platform as possible (in a sandboxed iframe), while maintaining the basic security rules that make normal web pages less dangerous. Not allowing drive-by downloads is definitely one of them!

Because the allow-downloads flag is all or nothing as specified, we either are forced to give our notebook authors the ability to write drive-by downloads, or we cannot allow them to offer downloads within the notebook at all — and we have to provide alternative APIs to trigger a download by other mechanisms, such as postMessage()-ing a Blob up to the parent frame.

Any interest in giving allow-downloads-by-user-activation a second chance?

jkarlin commented 4 years ago

Thanks for the report and question. Are you confident that the required user gesture would cover the use cases your users have? Could you describe some example use cases?

jashkenas commented 4 years ago

Thanks for the report and question. Are you confident that the required user gesture would cover the use cases your users have? Could you describe some example use cases?

Happy to.

In general, we would like to allow our users to create notebooks that allow readers to download computed content — images, CSV files, JSON, generated SVGs, and so on. We currently provide a DOM.download() function in the standard library, which wraps up the pattern of taking a JS Blob, and connecting it to an HTML button for download.

Here’s a coronavirus tracking graphic that allows readers to download the source data that backs it as a TSV formatted text file: https://observablehq.com/@elaval/coronavirus-worldwide-evolution

Here’s an npm package dependency visualizer notebook that allows you to download the resulting graphviz image as either an SVG or PNG file: https://observablehq.com/@mbostock/package-dependencies

Here’s a fancy exploration of building buckyball models out of toilet paper tube strips, which includes an interactive simulation, which you can manipulate and then download the currently configured graph as a JSON file: https://observablehq.com/@rreusser/toiletpaperfullerenes-and-charmin-nanotubes

Here’s a "chart-o-matic", where you can edit the textarea to generate a different bubble map, and then download the resulting image as a PNG or SVG: https://observablehq.com/@mbostock/bubble-o-matic

We would like to allow authors to create download links, buttons and interactions in their sandboxed iframes, without also allowing malicious drive-by downloads, triggered immediately upon loading a notebook. Here’s an example of a notebook where an author is using a simulated "click" event to force a download of a small SVG: https://observablehq.com/@msurguy/untitled/3 We’d like to use something like allow-downloads-by-user-activation to prevent this.

annevk commented 4 years ago

Since this is closed, it would be better if this moved into a new dedicated issue for that feature.

jashkenas commented 4 years ago

@annevk Is that something you’d like me to do? Or is it better if WhatWG contributors open new tickets for previously debated issues?

annevk commented 4 years ago

@jashkenas since you have the novel use case it'd be great if you could do it. When there's new information it's always cool to open a new issue.

jashkenas commented 4 years ago

Discussion moved to: https://github.com/whatwg/html/issues/5513

photopea commented 4 years ago

Guys, I am the author of www.Photopea.com . It is a photo editor, which is used by about 8 million people per month.

There are several hundreds of schools, who use it at their website in