rrrene / html_sanitize_ex

HTML sanitizer for Elixir
MIT License
271 stars 62 forks source link

strip_tags escapes ampersands, gt and lt #68

Open youroff opened 2 months ago

youroff commented 2 months ago

strip_tags replaces &, > and < symbols for corresponding HTML entities. This is unexpected, since those are not tags and they're not being stripped. Should it be a part of a different function instead?

HarshBalyan commented 4 weeks ago

@rrrene any thoughts? I am using HtmlSanitizeEx.html5(some_binary) and it replaces the above-mentioned symbols with HTML entities. Anyway to prevent this from happening?

rrrene commented 4 weeks ago

Hi, sorry for the late reply.

I seem to remember that this is done by the library we use for parsing the HTML.

Not sure we can prevent this, but I will try to look into it 👍

edit: I confirmed that :mochiweb_html is doing this, which causes this behaviour.

michaeljones commented 2 weeks ago

Has anyone identified a version of this library where it doesn't happen? Is there one that is safe to roll back to or are there are concerns?