Closed filak closed 1 year ago
¶
is being consumed as an entity. We fixed this in clean and I think we need to fix linkify in a similar way.
There are more entities with the same effect, ie. ¬ ® :
from bleach import Linker
linker = Linker()
text = 'http://test.com?a=1¬ify=1®ister=2'
print(linker.linkify(text))
## prints: <a href="http://test.com?a=1¬ify=1®ister=2" rel="nofollow">http://test.com?a=1¬ify=1®ister=2</a>
Adding for context:
This is related to #294 . The W3C calls this "fragile syntax".
IIRC, prior to the HTML5 spec the trailing semicolon for named references was NOT required, but it has been required since then. (see "Errors involving fragile syntax constructs" in the original https://dev.w3.org/html5/spec-LC/Overview.html and the current https://html.spec.whatwg.org/#syntax-errors )
To Reproduce
Expected behavior
Additional context
I believe this might happen somewhere in the html5lib_shim.py / BleachHTMLSerializer class: https://github.com/mozilla/bleach/blob/ed06d4e56b70e08fae2dd8f13b6a1955cf106029/bleach/html5lib_shim.py#L661