Sandboxed data: URI in a localhost page should be a Secure Context

jwatt commented 8 years ago

It seems like the sandboxing handling in the spec doesn't handle the case of a sandboxed data: document being embedded in a trusted but not-https page such as a http://localhost page.

The localhost page itself will be considered a Secure Context because the "Is origin potentially trustworthy?" algorithm special cases localhost. However, nowhere as far as I can tell sets the HTTPS state of a localhost page to 'modern'. The only place I'm aware of a spec saying to set the HTTPS state to a specific value (rather than inherit it) is where the Fetch spec sets it to 'modern', for https only:

https://fetch.spec.whatwg.org/#http-network-fetch

If the http://localhost page creates a sandboxed iframe to load a data: URI, the sandboxed document will have no HTTPS state to inherit. We will fall back to taking the origin of settings object’s creation URL. In the case of data: URIs that doesn't help us though, since the origin of a data: URI is a new opaque origin:

https://url.spec.whatwg.org/#concept-url-origin

So the sandboxed data: document will end up being Not Secure.

jwatt commented 8 years ago

In a note the spec does say "The origin of data: and javascript: URLs, on the other hand, is an opaque identifier, which will not be considered potentially trustworthy."

annevk commented 8 years ago

Yeah, Chrome would like to get rid of data URLs, mostly. So they are all kinds of useless in Chrome.

mikewest commented 8 years ago

Right. The spec does handle data:, sandboxed or otherwise, by excluding it as a secure context. That was intentional on my part. :)

I recognize that folks might disagree with that categorization, but I think it's unlikely that Chrome's behavior will change here, until and unless we change our handling of data: entirely to match Firefox's (which is itself unlikely). srcdoc will work in both Chrome and Firefox, and has the advantage of being clear with regard to ownership, responsibility, and origin.

annevk commented 8 years ago

I think the problem is mostly top-level browsing contexts and service such as https://software.hixie.ch/utilities/cgi/data/data which make use of that. For <iframe>, Worker, and such, some of Gecko's leadership seems to agree that we should align with Chrome long term.

bzbarsky commented 8 years ago

The spec does handle data:, sandboxed or otherwise, by excluding it as a secure context.

One problem is that it does that inconsistently, afaict. A data: iframe on an https page will be considered a secure context, whether it's sandboxed or not, because it will inherit the HTTPS state. A data: iframe on a localhost page will not be considered a secure context (certainly when sandboxed; there's obviously disagreement as to what it's origin is when not sandboxed). This seems broken to me.

This applies to other things that inherit HTTPS state as well; it just happens that blob: has a URL origin that's useful and sandboxed about:blank is not observable (I think; I'd have to check pretty carefully), so you can't tell that it's a secure context when on https but not when on localhost.

bzbarsky commented 8 years ago

and sandboxed about:blank is not observable

Upon further experimentation, totally false at least in Chrome. Simple testcase:

<script>
  onmessage = function(e) {
    alert(e.data);
  }
</script>
<iframe sandbox="allow-scripts"
        src="javascript:parent.postMessage(window.isSecureContext, '*')">
</iframe>

In Chrome, loading this from file:// this alerts true. Per spec as currently written, and assuming the script runs, it should alert false, as far as I can tell: There is no https state, in https://w3c.github.io/webappsec-secure-contexts/#settings-object step 4 origin should be an opaque identifier, the origin of the creation URL in this case is an opaque identifier, so "Is origin potentially trustworthy?" should return "Not Trustworthy".

In Gecko, in this situation the script does not run, which I think is correct per HTML spec. In particular, https://html.spec.whatwg.org/multipage/browsers.html#javascript-protocol says to no-op if the origin of the source browsing context (a bogus concept; filed https://github.com/whatwg/html/issues/1130, but the sane thing there would be the origin of the source document) doesn't match the origin of the thing the script would run against. For an iframe load, the source thing should be the document containing the <iframe> tag, though that doesn't seem to be clearly specified. Filed https://github.com/whatwg/html/issues/1131 on that.

Anyway, relying on the fact that we can never exfiltrate information from a sandboxed about:blank seems pretty fragile to me...

mikewest commented 8 years ago

I suspect Chrome's handling of javascript: in an iframe@src attribute is wonky. Thanks for the test case, @bzbarsky!
Whether data: inherits the HTTPS state depends very much on whether data: inherits an origin. In Chrome, I believe we inherit neither. That said, as specced I think you're correct that it's inconsistent. We should fix that.

mikewest commented 7 years ago

I've reworked the algorithm a bit, and I think it more correctly addresses the cases you raised, @bzbarsky. I'd appreciate it if y'all could skim it again to verify that it's as internally consistent as I think it might be. :)

bzbarsky commented 7 years ago

@mikewest That doesn't address the sandboxed data: URI issue, nor the difference in data: behavior between https and http://localhost as far as I can tell.... What am I missing?

mikewest commented 7 years ago

That doesn't address the sandboxed data: URI issue

Sandboxed data: is dealt with in step 5.4 of https://w3c.github.io/webappsec-secure-contexts/#settings-object: the sandboxed origin browsing context flag is set, so we hit https://w3c.github.io/webappsec-secure-contexts/#is-url-trustworthy with the context's creation URL. That algorithm returns "Not Trustworthy" in step 1.

In other words, we should now be consistently treating data: as non-secure.

nor the difference in data: behavior between https and http://localhost as far as I can tell.... What am I missing?

Hrm. I'm not sure? I might have tricked myself into thinking that the flow makes sense. Let's walk through the way that I think two examples work. Maybe I'm missing some steps in the algorithm?

Given http://127.0.0.1/ framing an unsandboxed about:srcdoc:

The top-level document will be secure: it's not sandboxed, and its origin ("http", "127.0.0.1", 80) is considered secure by https://w3c.github.io/webappsec-secure-contexts/#is-origin-trustworthy.
The nested document will be secure: its ancestors are secure, and its origin (also (http, 127.0.0.1, 80)) is likewise considered secure.

Given http://127.0.0.1/ framing a sandboxed about:srcdoc with the allow-secure-contexts token, but not the allow-same-origin token:

No change in the top-level; it's still secure.
The nested document will be secure: its ancestors are secure, and since its sandboxing flag is set, we look at its creation URL via the new https://w3c.github.io/webappsec-secure-contexts/#is-url-trustworthy. That returns "Potentially Trustworthy" in step 2.

These both seem like they work, and it seems like they'd give sane results for about:blank and data as well. WDYT?

bzbarsky commented 7 years ago

In other words, we should now be consistently treating data: as non-secure.

Well, consistently as long as it's sandboxed.

Given http://127.0.0.1/ framing an unsandboxed about:srcdoc:

I was talking about data:, not about:srcdoc.

Reading things more carefully, I guess the point is that secure https state is no longer sufficient to be treated as secure; this is the major change, right? So in Chrome unsandboxed data: in an HTTPS page would be considered insecure because of step 5.5, but per HTML spec as currently written it would be considered secure (because the origin inherits). OK, I guess I can live with this.

mikewest commented 7 years ago

I guess the point is that secure https state is no longer sufficient to be treated as secure; this is the major change, right?

Right.

OK, I guess I can live with this.

Can I modify it in a way that you'd be happier with?

bzbarsky commented 7 years ago

Well, I'd be happier if we treated data: more like srcdoc, but I understand that there are some philosophical disagreements there. ;)

mikewest commented 7 years ago

Good enough for me, thanks!

bzbarsky commented 7 years ago

OK, I think I do have a problem here, still.

In my opinion, the way secure contexts should be written is that sandboxed data is considered secure if and only if unsandboxed data is considered secure. Right now that's not the case.

bzbarsky commented 7 years ago

@jwatt @mikewest Looks like I can't reopen the issue... I'm not super-happy just filing a new one, but I can do that, of course.

mikewest commented 6 years ago

At this point HTML considers data: to be an opaque origin regardless of sandbox state, so I think we're good to go on this as-is.

annevk commented 4 years ago

Note that I reopened this in #69.

w3c / webappsec-secure-contexts

Sandboxed data: URI in a localhost page should be a Secure Context #26