rework how bleach.clean sanitizes css

mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

https://bleach.readthedocs.io/en/latest/

Other

2.65k stars 253 forks source link

rework how bleach.clean sanitizes css #633

Closed willkg closed 2 years ago

willkg commented 2 years ago

Bleach's Cleaner has a sanitize_css that's a complete pain in the ass to maintain. Not only is it an inscrutable regex for a complex specification, but it's got lots of edge cases and places where it just plain doesn't work. So far, the way we've been "handling" this is by modifying the regex to fix edge cases.

@g-k mentioned that maybe it's time we took a step back and re-evaluated the goals for sanitizing CSS. What browsers are we targeting? What attacks are we preventing? From there, we can figure out a different course of action. Maybe it's writing or switching to a real css parser. Maybe it's dropping css sanitizing altogether.

g-k commented 2 years ago

What browsers are we targeting?

I'd say either relatively recent Firefox, Safari, and Chromium-based browsers (say back to the last Fx ESR release or two) or not legacy browsers like Trident-based IE and other browsers from that era.

What attacks are we preventing?

I think this should basically be XSS, DOM Clobbering, and Structural Damage (bleach is Python so XSS via jQuery and Prototype Pollution) i.e. what DOMPurify handles minus the client side specific stuff. Since we expect bleach to run server side mXSS is a larger risk and bleach output should be safe for all supported browsers.

Maybe it's writing or switching to a real css parser. Maybe it's dropping css sanitizing altogether.

👍 in https://github.com/mozilla/bleach/issues/248 we looked at tinycss2 and some other options. It'll be easier to switch now that we're only on Python 3. Ideally we can provide a "use at your own risk" interface for people to swap in their own logic.

willkg commented 2 years ago

If there aren't CSS vectors, then I'd rather not sanitize CSS at all and just take all that code out. CSS only comes into play if someone is using .clean() and allowing style tags or style attributes. Otherwise this isn't an issue with our default safety stuff.

g-k commented 2 years ago

My only concern would be that we want to prevent damage to html outside the style attr or style elem.

For the attr, we'd want to validate that it can't be escaped. Doesn't contain quotes or double quotes is probably ok. IIRC we do this for other attrs.

The element is trickier, but maybe we can check that it closes and doesn't contain invalid CSS chars. There might be potential for mXSS if there's a way to use CSS to get html5lib out of sync with how browsers parse the style tag.

willkg commented 2 years ago

I don't have time right now to do a comprehensive analysis of CSS attack vectors, browser support, and use that to come up with a plan of action. Is that something you or someone else could do? If it's something we could do soon, then I'd be game for including this work in 5.0.0. Otherwise, I want to leave the sanitize_css stuff as is for 5.0.0 and target a future version.

g-k commented 2 years ago

I don't have time right now to do a comprehensive analysis of CSS attack vectors, browser support, and use that to come up with a plan of action. Is that something you or someone else could do?

Maybe? Probably not soon.

Just to be clear for 5.0 I'm thinking we could:

fix CI (mostly done? I think we just have to sort out missing hashes for some dev deps or drop hashes for dev deps)
drop 3.6 support (done)
remove the CSS url sanitizing regex to fix #529 and leave the rest of the regex-based CSS parsing in place for now. I can see a case for removing the regex based CSS parsing entirely, but like you say it'd require more analysis then we have time for.
drop support for legacy browsers & protection against the CSS url-based XSS for them
any other smaller bugfixes or refactoring you want to get in like splitting out the dev deps to another package and making CI less brittle

Does that sound reasonable? What are you thinking for 5.x?

willkg commented 2 years ago

I like everything you said. I created discussion topic https://github.com/mozilla/bleach/discussions/639 . We can do general release planning discussion there.

For reworking sanitize_css, it sounds like we need to do this:

figure out the target browsers (covered in issue #627
determine what we can drop from our css sanitization stuff; currently we think we want to:
1. remove css url sanitizing regex
2. leave everything else as is, but maybe extend the .clean() api to let people provide their own css sanitization function

Does that sound right?

willkg commented 2 years ago

This is all set now.