mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.52k stars 198 forks source link

Sanitizer does not remove comment but converts it to plain html #470

Open Sicos1977 opened 9 months ago

Sicos1977 commented 9 months ago

I use the latest version from nuget (not a beta version). When sanitizing the attached HTML it does not remove the comment that is between the javascript tags but for some reason it is converted to plain html.

image

comment.zip

mganss commented 9 months ago

What is your configuration? The HTML comment syntax used inside a script element does not create HTML comments but they become part of the script's text.

Sicos1977 commented 9 months ago

What do you mean with configuration? I don't understand that question.

The HTML is coming from an e-mail that is sent to us from a customer. We convert that e-mail to PDF but sanitize it before doing so.

mganss commented 9 months ago

Sorry, I should have been more clear. By configuration I mean how have you initialized the HtmlSanitizer object, which elements have you allowed in AllowedTags etc.

Sicos1977 commented 9 months ago

This is the code --> https://github.com/Sicos1977/ChromiumHtmlToPdf/blob/master/ChromiumHtmlToPdfLib/Helpers/DocumentHelper.cs it starts at line 189 and this are the settings.

Sorry for the Dutch comments.

a minus sign means first remove everything and then add the rows below the sign an asterix ( * ) means use default settings and the lines after it means add those to the default settings

image

mganss commented 9 months ago

I can't reproduce. AFAICT you are using HtmlSanitizer in the default configuration (default allowed tags, attributes etc). In that configuration, the script tag is disallowed and should be removed (including its content). Can you provide a minimal example that shows the issue?

Sicos1977 commented 8 months ago

Sorry for the late response, I got side tracked by other things so I have to look into this again.