Closed lilles closed 5 years ago
The rationale is that NULLs in a stylesheet are not useful, and NULL code points could be an indication of a buffer overrun, or an attempt of an attack by inserting NULL code points into the stylesheet.
Or another possibility - that you're trying to exfiltrate a local file (like a sqlite database) by getting it parsed as CSS and hoping you can capture a useful chunk of it in a url()
function pointing to a malicious server.
I'm in support of this. At bare minimum, I'd like to automatically invalidate any property or rule containing a NULL, but I'm okay with killing the entire stylesheet too. Anything's better than Firefox's current "eh, just treat the property like it ended" behavior upon encountering a null while parsing a property.
Just curious, is there a test-case for this? Both Blink and Firefox show red in the following example:
<!doctype html>
<style id="s">
</style>
<div>Which color?</div>
<script>
s.innerHTML = [
"div { color: red; }",
"div { color: green; }",
].join("\0");
</script>
I meant, a test-case for behavior difference between Firefox and other browsers.
Both Blink and Firefox show green in:
<!doctype html>
<style id="s">
</style>
<div>Which color?</div>
<script>
s.innerHTML = [
"div { color: red; } /*",
"*/div { color: green; }",
].join("\0");
</script>
So I don't think https://github.com/w3c/csswg-drafts/issues/2757#issuecomment-396337601 is quite accurate.
Yes, both of those examples are showing between-rules NULL; the first gloms into the second selector and invalidates it, the second is in a comment and does nothing.
<!doctype html>
<style id="s">
</style>
<div>Which color?</div>
<script>
s.innerHTML = "div { color: green; } div { color: red\0 }";
</script>
Shows green in Chrome, but iirc (can't test at the moment) it shows red in Firefox.
Seems like Gecko's been somewhat fixed since last time I tested. They used to truncate to "hello" for --a below:
<div id="a"></div>
<div id="b"></div>
<script type="text/javascript">
var css = document.createElement("style");
css.innerHTML = '* { --a: hello\0world; font-family: "foo\0test"; }';
document.body.appendChild(css);
a.innerHTML = getComputedStyle(document.body).getPropertyValue('--a');
b.innerHTML = getComputedStyle(document.body).getPropertyValue('font-family');
</script>
Gecko does not replace with a replacement character in custom idents, it seems.
The Working Group just discussed Consider disallowing NULL code points in stylesheets
.
Shows green in Chrome, but iirc (can't test at the moment) it shows red in Firefox.
Green in Firefox. Maybe you tested before 57, which shipped in November 2017 with a rewritten CSS parser?
Gecko does not replace with a replacement character in custom idents, it seems.
This is a bug, and specific to custom properties. I’ve filed https://bugzilla.mozilla.org/show_bug.cgi?id=1471772, thanks for mentioning it. Modified test case:
<div id="a"></div>
<div id="b"></div>
<script>
var css = document.createElement("style");
css.innerHTML = '* { --a: hello\0world; }';
document.body.appendChild(css);
a.innerHTML = JSON.stringify(css.sheet.cssRules[0].style.getPropertyValue("--a"));
b.innerHTML = JSON.stringify(getComputedStyle(document.body).getPropertyValue('--a'));
</script>
Which in Firefox 61 produces:
" hello\u0000world"
" hello"
Back to the topic, I agree with @FremyCompany that the concern doesn’t seem real. There is no evidence to support the “could be a sign of an attack” claim.
Replacing with U+FFFD already makes most CSS constructs invalid or not matching. I don’t think we need to go out of our way to do anything else.
The attack scenario is:
};};}; body { --foo:
.<link rel=stylesheet href="file:...">
.getComputedStyle(document.body).getPropertyValue("--foo")
, and captures all of the contents of the file between where the string got stored and the next byte that gets interpreted as a ;
character. This is potentially a large chunk of the file, grabbing information from other sites.Replacing NULL with U+FFFD doesn't solve this problem; that character is allowed in custom properties. Thus my minimal proposal of making a NULL automatically invalid in all contexts, so the custom property would be thrown out at parse time assuming that a NULL gets captured in the value (which is likely in the attack scenario mentioned; you'll probably find NULL bytes in a SQLite file). The maximal proposal, of invaliding the entire stylesheet if there's a NULL anywhere, would make it even safer, while having a fairly minimal chance of impact on actual stylesheets.
I believe that an HTTP(S) document would not be allowed to fetch a file:
URL at all, but I couldn’t find that being specified. Can someone confirm either way?
At least, such a request would be cross-origin. Per https://drafts.csswg.org/cssom/#style-sheet-association, a cross-origin stylesheet is not applied to the document if it does not have an appropriate Content-Type
.
Same-origin resources are not relevant here since they can be fetched with XMLHttpRequest
anyway.
So the problem this proposal is trying to solve is: protecting against exfiltration of a cross-origin resource that is not intended to be CSS but still served as text/css
. Regardless of whether this is a problem worth solving in the first place, we cannot know intentions for sure so at best we can guess with a heuristic.
We could imagine many different heuristics for “might not be intended as CSS syntax”. Using presence of U+0000 seems very arbitrary. What about other control code points? (U+0001 to U+001F except whitespace). Assuming UTF-8 decoding, what about 0xFF bytes? Or other byte sequences that are not well-formed UTF-8 and decoded as U+FFFD at that layer?
For any given heuristic, what is the risk of false positive? Even if there is no useful reason to deliberately make a stylesheet that contains { nulls, controls, 0xFF, ill-formed UTF-8, whatever }, is the reduction of exfiltration risk worth breaking real stylesheets where it happens accidentally?
I believe that an HTTP(S) document would not be allowed to fetch a file: URL at all, but I couldn’t find that being specified. Can someone confirm either way?
file:
documents can request file:
resources. http:
/https:
documents cannot, at least not in Chrome.
(Since my earlier comment wasn't clear: I support the original proposal. file:
documents requesting file:
resources leads to the potential of data exfiltration along the lines of what Tab suggested. We've seen exactly that attack in the vaguely recent past, and hardening the CSS parser to crash and burn on \0
seems quite reasonable to me. I'd be equally happy to follow @SimonSapin's suggestion of widening the ban to a larger set of control characters. :) )
To be clear my mentioning control characters was an attempt to show how arbitrary an heuristic it is to look for U+0000 or some other set of code points, not an actual proposal.
I think that the CSS tokenizer is the wrong layer to fix this. If the concern is for example with file:///C:/Users/me/Downloads/evil.html
requesting file:///C:/Users/Me/AppData/GoogleChrome/passwords.sqlite
, wouldn’t a heuristic based on URLs be better? For example going "up" a directory, or going through a directory that the OS considers hidden.
Two thoughts:
Yes. We should lock down file:
to a greater extent. We've had some discussion about that in https://github.com/whatwg/html/issues/3099, though the conversation didn't go anywhere, and is somewhat dead.
Why would it be bad to restrict CSS files from containing \0
or other control characters? I think your claim is that it might create breakage for some set of files out there in the world? I'm pretty sure I agree with @tabatkins assumption above that this is rare to the point of nonexistence. Is there a philosophical objection as well, or would this practical concern be resolvable by adding metrics, digging through HTTPArchive, etc?
It’d definitely break some files, we just don’t know how many. Maybe it’s few enough to be insignificant, maybe even by a lot. But this alone is not a sufficient reason to change mostly-interoperable behavior.
This thread has had no discussion of whether this proposal actually solves the problem it is trying to solve. Does a typical browser user’s filesystem not contain sensitive files that do not contain NULLs? Or even are entirely ASCII-printable? (Maybe a password database stored as JSON rather than SQLite?) Is there no exfiltration vector other than <link rel=stylesheet>
? Should we also look for NULLs in the HTML and XML and JavaScript parsers? Or, in the other direction, is should this change be specifically about custom properties rather than tokenizer-level?
More than the actual proposal, I’m worried about how the discussion of it is going (or not going).
Consider dropping stylesheet contents containing NULL code points or escaped NULLs, which are now replaced by the replacement character, completely. That is, let the result of parsing be as if the input was an empty string.
The rationale is that NULLs in a stylesheet are not useful, and NULL code points could be an indication of a buffer overrun, or an attempt of an attack by inserting NULL code points into the stylesheet.