mozilla / bleach

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
https://bleach.readthedocs.io/en/latest/
Other
2.65k stars 253 forks source link

bug: Open left angle bracket followed immediately by an alpha character causes next tag to be sanitized #733

Open fayyazul-centaurlabs opened 3 months ago

fayyazul-centaurlabs commented 3 months ago

Describe the bug

The tag immediately following an unclosed left angle bracket (used in a "less than context") causes the next tag to be sanitized

To Reproduce

Steps to reproduce the behavior:

>>> bleach.clean('<t <a></a> <a></a>')
'&lt;t &lt;a&gt; <a></a>'

Expected behavior

>>> bleach.clean('<t <a></a> <a></a>')
'&lt;t <a></a> <a></a>'

Additional context

The above error does not occur for non-alpha characters:

>>> bleach.clean('<5 <a></a> <a></a>')
'&lt;5 <a></a> <a></a>'
willkg commented 3 months ago

The test cases are helpful. Does this issue come up often? If so, what does the corpus look like?