mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
https://bleach.readthedocs.io/en/latest/
Other
2.65k stars 253 forks source link

switch to tinycss2 (#633) #648

Closed willkg closed 2 years ago

willkg commented 2 years ago

This switches to using tinycss2 for parsing CSS rules and serializing CSS tokens after sanitizing.

Things to think about:

  1. does this create new problems?
  2. does this fix some of the problems (hyphens, etc) we've got in the issue tracker so as to make it worth moving forward with?
  3. is the tinycss2 library dependable?
  4. should we additionally adjust the clean and Cleaner APIs to make it easier for users to use their own CSS sanitizers?
  5. should we also use this code to sanitize data in <style> tags?
  6. should we make tinycss2 an extras install? then it's not required for users who aren't sanitizing css.
willkg commented 2 years ago

@jvanasco @stucox @FelixSchwarz @kiilerix @Alex3917 @tony @EmilStenstrom @t-ionut and anyone else following along with sanitize css issues in Bleach, does this work for you? What else should we be thinking about and testing?

Alex3917 commented 2 years ago

Thanks @willkg, this looks great to me! It seems to be ~5% faster also. The only difference I notice over the ~400 tests I have for this is just the spaces being collapsed after colons, so e.g. 'font-space: 700' becoming 'font-space:700'. But that's fine with me!

kiilerix commented 2 years ago

... @kiilerix ... does this work for you? What else should we be thinking about and testing?

What is the goal of the CSS sanitization? I don't see anything in goals.rst that justifies it. If we trust browsers to handle garbage "safely" and don't care about "looks", what is left? And without an explicit statement of the goal, it is hard to comment on whether this change achieves it or not.

The problems I saw are fixed already by removing the previous unnecessarily strict and arbitrary CSS validation / sanitization.

I'm not sure about the use of tinycss2. While it presumably is "safe" for parsing arbitrary CSS, it might not provide the expected safety properties when used in this way. It is extra complexity and dependency. Does it really add any kind of actual security that justifies it?

I think I'm +1 for dropping the old code and -0 for introducing tinycss2.

Alex3917 commented 2 years ago

@kiilerix The issue with CSS is more that if a user can change the CSS for another user, then they can trick that user into doing dangerous things. E.g. if there is a button that says "Delete Account", and a malicious user changes it to say "Get Free Sample!" or something. Here is an article about this that was on Hacker News this week:

https://scotthelme.co.uk/can-you-get-pwned-with-css/

FelixSchwarz commented 2 years ago

I used tinycss2 in the past without problems (though I did not really do complicated stuff using the library). From a Fedora perspective we have v1.1+ in all supported Fedora releases and tinycss2 1.0.x in EPEL 8.

jvanasco commented 2 years ago

Our main application is a Content Management System. On a given "property" there are two general types of users: "admin" and "public".

We completely block inline CSS with bleach for security and UX concerns for all users in all contexts. I could not imagine a way to allow it in user-generated content that did not introduce issues.

We allow "admins" to upload custom templates. Those are cleaned by bleach and allow class attributes for CSS customization, and only injected into the public facing content templates – for security concerns there is a strict separation in where those can be injected, so it can not affect anything touching account or property management. Those are parsed by another library, I think it is tinycss. We created an overly complex way of implementing and restricting the CSS overrides with the template structure and load order, I don't recall the specifics offhand.

@Alex3917 Additional concerns with CSS include breaking UX, disrupting comment systems, and sneaking in tracking pixels or images via "url". The latter the same vector as described for data exfiltration in the link you shared, but something I saw attempted often by bots and scammers when I ran a large website with a highly active comment system.

willkg commented 2 years ago

^^^ That improves things a bit.

  1. It makes css sanitization require the css extras which will install tinycss2. Thus tinycss2 is now an optional dependency. It specifies an upper and lower bound for acceptable versions.
  2. It redoes the API around css sanitization so it's done with a class. This makes it easier for devs to drop in a different class if they need different functionality.
  3. It defaults to ALLOWED_CSS_PROPERTIES that html5lib sanitizer used rather than the empty list.
  4. It updates the documentation and tests accordingly.

I spent some time looking at adding support for sanitizing <style> content. I think I'm going to push that off.

I still need to figure out AtRule handling.

willkg commented 2 years ago

I'm going to punt on the AtRule handling. It seems nice for completeness, but I don't have any good ideas on how to do it and if someone needs it, they can write their own CSSSanitizer since they probably have other needs as well. Given that, I think I'm done working on this and it's ready for review.

willkg commented 2 years ago

@g-k Thank you for helping me through that!