mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
https://bleach.readthedocs.io/en/latest/
Other
2.65k stars 251 forks source link

Fix character entity handling (#450) #647

Closed willkg closed 2 years ago

willkg commented 2 years ago

This fixes a bug where things in the form of a character entity (starts with &, some letters, and ends with ;) that aren't character entities get mangled. Now they are correctly handled.

This also fixes a bug where things in the form of a character entity that aren't character entities should have had the & escaped, but that wasn't happening.

After this, Bleach does the following:

This is easy to reason about and idempotent and doesn't suffer from ambiguous edge cases where Bleach can't know what the user intended.

I also cleaned up the comments in the tests so it's clearer what we intend Bleach to be doing.

This fixes #450.

willkg commented 2 years ago

Oh, that's a good idea. I'll add some more idempotent tests now.

willkg commented 2 years ago

@g-k ^^^ Does that cover the idempotent tests you were thinking of? Are there others we should add?

g-k commented 2 years ago

Does that cover the idempotent tests you were thinking of? Are there others we should add?

Oh nice! I was thinking of it as a TODO / good predicates to check for #552

willkg commented 2 years ago

That makes sense. I haven't quite gotten around to looking at the fuzz stuff, yet. I'll toss that in my list of things to do, but probably after the 5.0.0 release.

Thank you!