whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.05k stars 2.64k forks source link

Origin of blob: documents doesn't match implementations #2759

Open bzbarsky opened 7 years ago

bzbarsky commented 7 years ago

What is the origin of a document loaded from a blob: URL without sandboxing?

Stepping through the "For Document objects" list at https://html.spec.whatwg.org/multipage/origin.html#origin (sadly no direct way to link it):

  1. No sandboxing, does not apply.
  2. URL is not data:, does not apply.
  3. Scheme is not a network scheme, does not apply.
  4. Not about:blank, does not apply.
  5. Not about:blank, does not apply.
  6. Not javascript:, does not apply.
  7. Not srcdoc, does not apply.
  8. Fall through to DOM behavior, give it a unique origin.

Needless to say, this is not how blob: actually works in browsers, nor how anyone expects it to work. @mikewest @annevk @domenic

What Gecko does in practice is that when you create a Blob URL the association from url to Blob includes an association to the origin of the thing that created the blob, I believe. The load from the blob: URL gets that origin. In particular, if you load a subframe from a blob: URL and then set document.domain (or do it in the opposite order, either way) you are still "same origin-domain" with the blob: thing you loaded. Note that this is not the same thing as javascript: and about:blank origin inheritance, but similar.

domenic commented 7 years ago

This seems to fit into a general known issue, although I'm not sure people have specifically understood the sandboxing case before:

@mkruisselbrink

It really would be good to get this fixed.

bzbarsky commented 7 years ago

Oh, sandboxing was explicitly brought up in those issues. I wish we had an actual searchable issue database; right now just finding which repo issues are reported against is a total guess. :(

domenic commented 7 years ago

I mean, this is somewhat of a special case as the definition of origin of URL is spread across two specs right now. Over time we should be able to consolidate that into one.

Not sure about how or whether we can deal with the definition of "origin" in general being spread out across HTML/URL/DOM though. I guess I have some ideas but I'll move them to another thread...

bzbarsky commented 7 years ago

Also, it sort of fits into that issue, but note that the basic problem here is that we never even reach the "origin of the URL" bit at all here.

A second problem is that even if we did that might not match UAs in terms of document.domain handling. It looks like I have a testcase written for it already, so maybe I even filed it before... http://web.mit.edu/bzbarsky/www/testcases/security/blob-iframe-document-domain.html shows that Firefox treats the blob: as same origin-domain but Chrome does not. So I'm really not sure what Chrome implements exactly, and in particular how it manages to have nonce origins match but apparently without it being the same origin in general (because the document.domain bits don't match)?

mikewest commented 7 years ago

Hrm. It looks like Chrome is doing the wrong thing here; I agree with @bzbarsky that we ought to be following Firefox's behavior here; I haven't looked at the code, but I assume we're just creating a new origin object rather than aliasing the existing origin, so the domain check fails. Filed https://bugs.chromium.org/p/chromium/issues/detail?id=733351 to track.