[css-syntax] Consider disallowing NULL code points in stylesheets

lilles commented 6 years ago

Consider dropping stylesheet contents containing NULL code points or escaped NULLs, which are now replaced by the replacement character, completely. That is, let the result of parsing be as if the input was an empty string.

The rationale is that NULLs in a stylesheet are not useful, and NULL code points could be an indication of a buffer overrun, or an attempt of an attack by inserting NULL code points into the stylesheet.

tabatkins commented 6 years ago

The rationale is that NULLs in a stylesheet are not useful, and NULL code points could be an indication of a buffer overrun, or an attempt of an attack by inserting NULL code points into the stylesheet.

Or another possibility - that you're trying to exfiltrate a local file (like a sqlite database) by getting it parsed as CSS and hoping you can capture a useful chunk of it in a url() function pointing to a malicious server.

I'm in support of this. At bare minimum, I'd like to automatically invalidate any property or rule containing a NULL, but I'm okay with killing the entire stylesheet too. Anything's better than Firefox's current "eh, just treat the property like it ended" behavior upon encountering a null while parsing a property.

emilio commented 6 years ago

Just curious, is there a test-case for this? Both Blink and Firefox show red in the following example:

<!doctype html>
<style id="s">
</style>
<div>Which color?</div>
<script>
  s.innerHTML = [
    "div { color: red; }",
    "div { color: green; }",
  ].join("\0");
</script>

emilio commented 6 years ago

I meant, a test-case for behavior difference between Firefox and other browsers.

emilio commented 6 years ago

Both Blink and Firefox show green in:

<!doctype html>
<style id="s">
</style>
<div>Which color?</div>
<script>
  s.innerHTML = [
    "div { color: red; } /*",
    "*/div { color: green; }",
  ].join("\0");
</script>

So I don't think https://github.com/w3c/csswg-drafts/issues/2757#issuecomment-396337601 is quite accurate.

tabatkins commented 6 years ago

Yes, both of those examples are showing between-rules NULL; the first gloms into the second selector and invalidates it, the second is in a comment and does nothing.

<!doctype html>
<style id="s">
</style>
<div>Which color?</div>
<script>
  s.innerHTML = "div { color: green; } div { color: red\0 }";
</script>

Shows green in Chrome, but iirc (can't test at the moment) it shows red in Firefox.

lilles commented 6 years ago

Seems like Gecko's been somewhat fixed since last time I tested. They used to truncate to "hello" for --a below:

<div id="a"></div>
<div id="b"></div>

<script type="text/javascript">
var css = document.createElement("style");
css.innerHTML = '* { --a: hello\0world; font-family: "foo\0test"; }';
document.body.appendChild(css);

a.innerHTML = getComputedStyle(document.body).getPropertyValue('--a');
b.innerHTML = getComputedStyle(document.body).getPropertyValue('font-family');
</script>

Gecko does not replace with a replacement character in custom idents, it seems.

css-meeting-bot commented 6 years ago

The Working Group just discussed Consider disallowing NULL code points in stylesheets.

The full IRC log of that discussion

<dael> Topic: Consider disallowing NULL code points in stylesheets
<dael> github: https://github.com/w3c/csswg-drafts/issues/2757
<fantasai> Tab's position: https://github.com/w3c/csswg-drafts/issues/2757#issuecomment-396337601
<dael> florian: The issue is that having a null codepoint in the middle of stylesheet is not useful in general, but is a potential security risk. Could single something like a code extract. Therefore we should kill the stylesheet somehow. How and how much should be discussed. Poss reject entire stylesheet or a milder version
<dael> florian: Going further I'm not the right person, but that's high level.
<dael> hober: Current interop?
<dael> florian: I believe not. Current is that browsers discard, but how they recover doesn't seem interop.
<dael> florian: There are mentions of differences in the issue. HOw different I'm not sure
<dael> Rossen_: I ran through all of these locally. I was seeing interop between Edge Chrome and FF on all the test cases in the issue
<dael> Rossen_: Curious to hear from dbaron
<dael> dbaron: Sounds like emilio has but I haven't
<dael> bradk: Any idea how often this is in the wild as possibly a mistake?
<dael> Rossen_: Not sure we have such data
<dael> Rossen_: If you're asking how often people try and spoof browsers it's quite often and this is a potential attach vector, but this is more speculative.
<dael> florian: Question is how often do we have almost valid style sheets that were meant to be read.
<dael> Rossen_: Don't think we can disambiguate between
<dael> bradk: Concern is break working stylesheets
<dael> Rossen_: My understanding is you could have actual end user effects
<dael> bradk: Will this break websites?
<emilio> Err, present+
<dael> Rossen_: Potentially
<dael> plinss: Tooling injecting null bites is fairly low unless they'll already be present
<dael> florian: Current possible interop you Rossen_ observed is it reasonably safe or interop unsafe?
<Rossen_> http://jsbin.com/vozudorilu/edit?html,output
<dael> Rossen_: I won't make statements in terms of this as an attack vector. From interop, I just re-ran a variation on some test cases I observe differences here ^
<dael> Rossen_: In this case you join the null character. This invalidates in Chrome the entire selector. In FF and Edge we only invalidate the join part of the last selector and honor the first part.
<dael> Rossen_: Result is after joining selector with null Edge and FF get red and black default in Blink
<dael> florian: I get black in FF
<dael> Rossen_: Well...I don't know version.
<dael> florian: 61. Updated today
<dael> Rossen_: I'm on 60. Let me take the 61 update
<dael> emilio: [something about nightly]
<dael> Rossen_: So there is non-interop
<dael> Rossen_: FF 61 I still see red
<emilio> s/[Something about nightly]/Nightly shows red to me/
<dael> fremy: I see red in every browser
<dael> Rossen_: Instead of debating this test case. Let's go back to the behavior in issue.
<dael> florian: Safe proposal is if you see any null descard entire stylesheet
<dael> fantasai: 3 proposals
<dael> fremy: I don't see doing that. I don't see why a null bite is a security issue. I don't see why discard an entire stylesheet
<dael> florian: One nul bit not dangerious. no known use case but can be sign of attack
<dael> fremy: Any JS is sign of potential attack.
<dael> Rossen_: That's not only option. it's A option.
<dael> fantasai: Otehrs were terminate stylesheet at that point. 3rd was auto invalidate any construct with that null according to normal invalid rules
<fantasai> e.g. invalidate declaration, or style rule, or whatever construct we would normally invalidate
<dael> Rossen_: 3rd makes most sense. Finding the closest adjacent or encompasisng rule/select/whatever and discard that is most intuative.
<bradk> Option 3 seems most reasonable to me
<gsnedders> I agree with fremy, given we don't do the same in HTML.
<dael> Rossen_: 2nd is a variation of that and almost as bad as discard entire stylesheet
<dael> plinss: Another option is ignore null and pretend not there. Rossen_ you seem to favor terminate existing construct, but that's most dangerious for security because ti means you have to pass it through. If you ignore the null or terminate or discard you can do that at preparse and then you're safe.
<liam> +1 to plinss
<dael> fremy: Still dont' understand premise null bit is unsafe
<dael> florian: It may be a tell tale sign something else is being attempted. If someone is dumping content of memory you'll see null bits
<dael> plinss: Risk is you have some code that takes entire construct with the null in the middle and others that terminate the string and you're opening to security vulnerabilities.
<dael> Rossen_: Thinking a bit more I like the 4th option to ignore null codepoints during preparse and curate the stylesheet as if there were no code points.
<dael> florian: If you dump memory without nulls you can still export that to some server.
<fantasai> plinss, wouldn't handle Tab's concern in 1st paragraph of https://github.com/w3c/csswg-drafts/issues/927
<gsnedders> q+
<dael> gsnedders: As presedent html replaces null with +fffd and that's better then drop
<gsnedders> ack
<dael> plinss: [missed]
<gsnedders> Zakim: ack
<gsnedders> q-
<dael> florian: Then not dealing with security problem. If you keep null bites you can keep them on server.
<liam> s/bites/bytes/
<dael> fremy: I don't buy it. I think it's pointless. If you're able to dump memory you can literally do a JS script. The whole premise makes no sense. I see no reason to not use syntax. Browsers can impl css syntax and do what it says. I don't see why do something special because then you create edge cases.
<gsnedders> s/html replaces null with +fffd/html replaces null with +fffd in some cases/
<dael> fremy: I think premise of thread is not based on something real. I haven't seen anyone from security team say it's real. If no one from security says it's a problem we should fix I don't think we should make any change to algo. There's no strong indication of a problem. If we agree to make a change from security we need someone from security
<dael> florian: How do we handle this? If we say a browser has a known vulnerability on this how do they discuss it with us?
<dael> gsnedders: Member only
<dael> florian: Is that private enough?
<dael> liam: [missed]
<liam> [there's also a team-only list]
<dael> Rossen_: We can either postpone to F2F and discuss in person to see all the concerns and options. Or at least we can record the preference of the listed options as at least a proposed resoltuion
<dael> Rossen_: 1 drop entire ss 2 drop ss after null 3 drop encompassing selector around null 4 drop null during pre process
<dael> Rossen_: That's what's on the table. Only remaining question is do we want to resolve on any of those?
<dael> ??: Also option of no change
<dael> Rossen_: Yes
<dael> fantasai: I would say we've had a good discussion. List the options in the issue, request TAG review, give people time to think.
<dael> Rossen_: Exactly. That's why I wanted to repeat the options.
<dael> Rossen_: Doesn't sound like we'll resolve. Discussion worth capturing. Potentially with TAG assistance or at F2F we can resolve

SimonSapin commented 6 years ago

Shows green in Chrome, but iirc (can't test at the moment) it shows red in Firefox.

Green in Firefox. Maybe you tested before 57, which shipped in November 2017 with a rewritten CSS parser?

SimonSapin commented 6 years ago

Gecko does not replace with a replacement character in custom idents, it seems.

This is a bug, and specific to custom properties. I’ve filed https://bugzilla.mozilla.org/show_bug.cgi?id=1471772, thanks for mentioning it. Modified test case:

<div id="a"></div>
<div id="b"></div>

<script>
var css = document.createElement("style");
css.innerHTML = '* { --a: hello\0world; }';
document.body.appendChild(css);

a.innerHTML = JSON.stringify(css.sheet.cssRules[0].style.getPropertyValue("--a"));
b.innerHTML = JSON.stringify(getComputedStyle(document.body).getPropertyValue('--a'));
</script>

Which in Firefox 61 produces:

" hello\u0000world"
" hello"

SimonSapin commented 6 years ago

Back to the topic, I agree with @FremyCompany that the concern doesn’t seem real. There is no evidence to support the “could be a sign of an attack” claim.

Replacing with U+FFFD already makes most CSS constructs invalid or not matching. I don’t think we need to go out of our way to do anything else.

tabatkins commented 6 years ago

The attack scenario is:

Find a browser that stores some website-based information (like localStorage or something) in a file-based database (like sqlite).
Evil site, upon being visited, stores a hostile string into that database, which contains something like };};}; body { --foo:.
Evil site then loads that database as a local stylesheet with <link rel=stylesheet href="file:...">.
Evil site queries getComputedStyle(document.body).getPropertyValue("--foo"), and captures all of the contents of the file between where the string got stored and the next byte that gets interpreted as a ; character. This is potentially a large chunk of the file, grabbing information from other sites.

Replacing NULL with U+FFFD doesn't solve this problem; that character is allowed in custom properties. Thus my minimal proposal of making a NULL automatically invalid in all contexts, so the custom property would be thrown out at parse time assuming that a NULL gets captured in the value (which is likely in the attack scenario mentioned; you'll probably find NULL bytes in a SQLite file). The maximal proposal, of invaliding the entire stylesheet if there's a NULL anywhere, would make it even safer, while having a fairly minimal chance of impact on actual stylesheets.

SimonSapin commented 6 years ago

I believe that an HTTP(S) document would not be allowed to fetch a file: URL at all, but I couldn’t find that being specified. Can someone confirm either way?

At least, such a request would be cross-origin. Per https://drafts.csswg.org/cssom/#style-sheet-association, a cross-origin stylesheet is not applied to the document if it does not have an appropriate Content-Type.

Same-origin resources are not relevant here since they can be fetched with XMLHttpRequest anyway.

So the problem this proposal is trying to solve is: protecting against exfiltration of a cross-origin resource that is not intended to be CSS but still served as text/css. Regardless of whether this is a problem worth solving in the first place, we cannot know intentions for sure so at best we can guess with a heuristic.

We could imagine many different heuristics for “might not be intended as CSS syntax”. Using presence of U+0000 seems very arbitrary. What about other control code points? (U+0001 to U+001F except whitespace). Assuming UTF-8 decoding, what about 0xFF bytes? Or other byte sequences that are not well-formed UTF-8 and decoded as U+FFFD at that layer?

For any given heuristic, what is the risk of false positive? Even if there is no useful reason to deliberately make a stylesheet that contains { nulls, controls, 0xFF, ill-formed UTF-8, whatever }, is the reduction of exfiltration risk worth breaking real stylesheets where it happens accidentally?

mikewest commented 6 years ago

I believe that an HTTP(S) document would not be allowed to fetch a file: URL at all, but I couldn’t find that being specified. Can someone confirm either way?

file: documents can request file: resources. http:/https: documents cannot, at least not in Chrome.

mikewest commented 6 years ago

(Since my earlier comment wasn't clear: I support the original proposal. file: documents requesting file: resources leads to the potential of data exfiltration along the lines of what Tab suggested. We've seen exactly that attack in the vaguely recent past, and hardening the CSS parser to crash and burn on \0 seems quite reasonable to me. I'd be equally happy to follow @SimonSapin's suggestion of widening the ban to a larger set of control characters. :) )

SimonSapin commented 6 years ago

To be clear my mentioning control characters was an attempt to show how arbitrary an heuristic it is to look for U+0000 or some other set of code points, not an actual proposal.

I think that the CSS tokenizer is the wrong layer to fix this. If the concern is for example with file:///C:/Users/me/Downloads/evil.html requesting file:///C:/Users/Me/AppData/GoogleChrome/passwords.sqlite, wouldn’t a heuristic based on URLs be better? For example going "up" a directory, or going through a directory that the OS considers hidden.

mikewest commented 6 years ago

Two thoughts:

Yes. We should lock down file: to a greater extent. We've had some discussion about that in https://github.com/whatwg/html/issues/3099, though the conversation didn't go anywhere, and is somewhat dead.
Why would it be bad to restrict CSS files from containing \0 or other control characters? I think your claim is that it might create breakage for some set of files out there in the world? I'm pretty sure I agree with @tabatkins assumption above that this is rare to the point of nonexistence. Is there a philosophical objection as well, or would this practical concern be resolvable by adding metrics, digging through HTTPArchive, etc?

SimonSapin commented 6 years ago

It’d definitely break some files, we just don’t know how many. Maybe it’s few enough to be insignificant, maybe even by a lot. But this alone is not a sufficient reason to change mostly-interoperable behavior.

This thread has had no discussion of whether this proposal actually solves the problem it is trying to solve. Does a typical browser user’s filesystem not contain sensitive files that do not contain NULLs? Or even are entirely ASCII-printable? (Maybe a password database stored as JSON rather than SQLite?) Is there no exfiltration vector other than <link rel=stylesheet>? Should we also look for NULLs in the HTML and XML and JavaScript parsers? Or, in the other direction, is should this change be specifically about custom properties rather than tokenizer-level?

More than the actual proposal, I’m worried about how the discussion of it is going (or not going).

w3c / csswg-drafts

[css-syntax] Consider disallowing NULL code points in stylesheets #2757