w3c / webappsec-csp

WebAppSec Content Security Policy
https://w3c.github.io/webappsec-csp/
Other
209 stars 78 forks source link

Clarify meaning of 'self' for CSPs in meta tags of local-scheme documents #459

Open antosart opened 3 years ago

antosart commented 3 years ago

I couldn't find out from the spec what the keyword 'self' should mean when the policy is served inside the meta tag of a local-scheme (data: or about: or blob:) document.

I believe it would make sense in those cases for 'self' to match nothing.

In fact, this test https://wpt.fyi/results/content-security-policy/frame-src/frame-src-self-unique-origin.html?label=experimental&label=master&aligned asserts that 'self' in a data: url matches nothing.

antosart commented 3 years ago

(Related: although it's not a local scheme, I think it would make also sense for 'self' in file: documents to match nothing).

annevk commented 3 years ago

See also #405.

Per https://w3c.github.io/webappsec-csp/#policy-self-origin it seems like this is intended to work. I also don't think we should treat an opaque origin from a local scheme differently from an opaque origin due to a sandboxed HTTPS document.

mikewest commented 3 years ago

We did do some work a while back to make this functional. Hrm.

  1. It seems clear that it ought to work for about: documents that have a non-opaque origin (that is, I think 'self' inside an about:srcdoc document embedded in https://example.com/ should mean https://example.com/).
  2. I guess I'm fine with 'self' blocking everything inside data: documents (the alternative would be to allow only data: resources, I guess).
  3. I don't have a strong opinion about sandboxed about: documents. It seems important for us to agree on the behavior between browsers, but both a) blocking everything and b) using whatever the origin would have been absent sandboxing seem like defensible paths.

What do Gecko and WebKit do here?

antosart commented 3 years ago
  1. It seems clear that it ought to work for about: documents that have a non-opaque origin (that is, I think 'self' inside an about:srcdoc document embedded in https://example.com/ should mean https://example.com/).

On 1. I think I can agree on about:srcdoc documents, since they are somehow content inside the main document. But what about about: popups? I think it would be fine that 'self' means nothing for them.

Notice that I am explicitly talking only about 'self' in policies parsed from meta elements contained in the local scheme document. It is totally clear that for policies inherited from the main document, we also inherit the meaning of self.

annevk commented 3 years ago

An (initial) about:blank popup inherits the origin from its (creator) navigator. It will at least inherit a policy (as otherwise it's an escape) and if that policy contains self it will need to make sense so I don't see why it would then be different for <meta> in the same document.

mikewest commented 3 years ago

If you mean manually opening a new window and/or navigating manually to about:blank via the address bar, then I agree that 'self' is somewhat meaningless. As @annevk notes, windows opened from other documents will inherit in the cases you're working through via the policy container implementation, and it seems pretty reasonable to me for 'self' to have the same meaning when inherited as when specified by <meta>.

antosart commented 3 years ago

Ok, thanks for the clarification. Let me try to summarize:

  1. I think I am convinced that 'self' in about: documents should match the origin (which is the initiator's origin). I will write some WPT for this.

  2. As for data:, 'self' should match nothing and we already have WPTs covering it.

  3. I guess blob: should behave like about:.

  4. It remains the question of sandboxed documents with opaque origins. I guess that is independent on the document having a local scheme or not. The current behaviour (e.g. https://wpt.fyi/results/content-security-policy/frame-src/frame-src-sandboxed-allowed.html?label=experimental&label=master&aligned), aligned between browsers, seems to be to ignore the sandbox and match the origin-before-sandbox. This also makes sense to me.

(Note: Firefox and Chrome's behaviour seem to differ on 1., see https://wpt.fyi/results/content-security-policy/meta/sandbox-iframe.html?label=experimental&label=master&aligned [which fails on Firefox independently on the srcdoc iframe being sandboxed]. But from @annevk's comments I understand he also agrees that 1. above is the expected behaviour and would like to converge there.)

I still believe it would make sense to clarify all of these somewhere in the spec.

annevk commented 3 years ago

I think that makes sense to align on given the status quo in implementations. Thanks for writing tests!

Having said that, I'm not super happy with how CSP seems to largely use URLs to compute authority rather than use the authority of the environment. In particular, data: URLs should not be special cased here, this should fall out of them creating an opaque-origin environment and apply to all such environments. And ideally sandboxing would have been done as a step we apply before CSP so that it too would create an equivalent opaque-origin environment for the purposes of the remainder of CSP and other policies. Instead CSP seems to ignore the environment and just look at the URL, which doesn't seem exactly sound.

antosart commented 3 years ago

Having said that, I'm not super happy with how CSP seems to largely use URLs to compute authority rather than use the authority of the environment.

I am not sure, it depends how developers interpret 'self'. I see two possibilities:

  1. 'self' is just a shortcut for the page's url (like I am a small website developer and want to allow everything which comes from my website, but I don't want to hardcode my origin in my CSPs).
  2. 'self' means the document origin.

I think we should choose one of the two and follow it consistently.

Personally I think 1. is more sounded. Mainly because when we inherit policies, I believe we should inherit the meaning of 'self'. It does not make sense to me that we inherit CSPs in a dedicated worker served from a different origin, and the effectively inherited policy is different because now 'self' means the new origin. (Indeed, the spec is quite explicit on this at the moment).

I also believe 1. is more clear to web developers, while I do not see a real need for 2.

annevk commented 3 years ago

I'm not sure I follow. Inheritance should never happen across the origin boundary.

And your response does not discuss sandboxing, which I think is where this comes into play most.

(Again, I'm not sure we can change this so we have to go with 1, so our disagreement probably does not matter.)

antosart commented 3 years ago

I'm not sure I follow. Inheritance should never happen across the origin boundary.

If I understand correctly the CSP spec, "Dedicated workers now always inherit their creator’s policy.", also if cross-origin.

And your response does not discuss sandboxing, which I think is where this comes into play most.

That's true, because it's the point I am most unsure about :) In favour of 1. for sandbox, I find it weird that you would have to change your CSPs just because your page gets embedded in a sandboxed iframe (which is, from some point of view, out of your control).

annevk commented 3 years ago

You are correct, for data: URL dedicated workers (which are indeed cross-origin) there will be inheritance as decided per https://github.com/whatwg/html/issues/3270 as otherwise they could be used to escape a "sandbox". However, there should not generally be inheritance for dedicated workers as per that issue. Whether it makes sense for such a data: URL to still be allowed to fetch scripts from the same origin as the creator, not sure. (As it's an opaque origin it could never fetch anything from there.)

(It might be good to add a worker section to https://github.com/antosart/policy-container-explained, come to think of it.)

Now for sandboxing. I think there are two cases:

  1. Document includes a frame that has a sandbox.
  2. Document's CSP sets a sandbox.

Having thought about it a bit I think I'm persuaded that it doesn't really matter what self maps to for these cases as the document author is in control either way.