Closed iharsuvorau closed 7 years ago
What is accurately? What do you see, and what do you expect to see?
Sanitizing <p>1 < 2</p>
I'm getting 1
(with a space after the number) and I want to get 1 < 2
. That's because the parser ignores everything after the <
char even if it's not a HTML-tag.
I'm not sure that's valid html, you're supposed to escape less than signs (in html 5 at least). If you try validating this html here:
<!DOCTYPE html>
<html>
<head>
<title>asdf</title>
</head>
<body>
<p>1 < 2</p>
</body>
</html>
https://validator.w3.org/#validate_by_input
It will fail. It's hard for the parser to know if this is the start of a tag, or a less than sign, so the entity would be better. Browsers may try to work around it as they're used to broken html, but I would try to fix the html.
You could pre-process it by searching for ' < ' and replacing with ' < ', but I think I'd rather not do that in sanitize.
Is it correct that when using
sanitize.HTML
the HTML like<p>1 < 2</p>
won't be parsed accurately?