mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
https://bleach.readthedocs.io/en/latest/
Other
2.66k stars 250 forks source link

fork html5lib-python or find alternative #680

Closed willkg closed 1 year ago

willkg commented 2 years ago

Bleach relies heavily on html5lib-python and that project has been vaguely maintained for some time. In 2019, I stepped up to push out a 1.0.0 release. It hasn't had a lot of activity since then. I think we should call it a dead project at this point.

We've looked at alternatives over the years, but haven't found anything that works well. Bleach has slightly different parsing needs than a library designed to parse html like a browser.

One alternative is to fork html5lib-python. That gives us a few things:

  1. if there's a security issue, we can solve it much more easily because the solution can be localized to Bleach
  2. we can begin to remove things we don't need that make Bleach hard to maintain
  3. we can fix some of the API which Bleach currently works around with the shim in high-risk ways
  4. we can modernize it--html5lib doesn't officially support Python > 3.8

Are there other viable alternatives? If so, can someone build a Bleach prototype with them?

If there are no viable alternatives, we should go with forking html5lib-python.

willkg commented 1 year ago

We're deprecating the project, so this is moot now.