mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.55k stars 200 forks source link

String not properly sanitised when KeepChildNodes = true #303

Closed LolliDepp closed 3 years ago

LolliDepp commented 3 years ago

With the KeepChildNodes flag set to true, the following string: \n<img<svg><script><img onerror=alert(document.domain) src\n\n<div><br></div> gets sanitised to: \n<img onerror=alert(document.domain) src\n\n<div><br></div>

Even with the KeepChildNodes flag set to true I'd expect the onerror to be sanitised, but maybe that's just a misunderstanding of the capabilities of the library.

Setting KeepChildNodes to false results in the string \n

mganss commented 3 years ago

Are you sure that's the result? I'm getting

\n&lt;img onerror=alert(document.domain) src\n\n&lt;div&gt;&lt;br&gt;&lt;/div&gt;

The parser sees an element whose name is img<svg which has a script child element which in turn has no children but the text content <script><img onerror=alert(document.domain) src\n\n<div><br></div></script>.

LolliDepp commented 3 years ago

I see, I missed an HtmlDecode call - the current behaviour makes perfect sense. Thanks for the answer