mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.52k stars 198 forks source link

Sanitizing xml with Body Tag #475

Open Ghyath-Serhal opened 8 months ago

Ghyath-Serhal commented 8 months ago

I am using HtmlSanitizer to sanitize the below xml data, that contain a body tag.

<?xml version="1.0" encoding="utf-8"?>
<Tag1 xmlns="urn:swift:saa:xsd:saa.2.0">
  <tag2>This is tag 2</tag2>
  <tag3>This is tag 3</tag3>
  <body>this is the body</body>
</Tag1>

I have added the tag1, tag2, tag3 and body to the AllowedTags attribute. I am getting the below result. As you can see the body tag is removed. I am just getting the data in the body tag.

<tag1 xmlns="urn:swift:saa:xsd:saa.2.0">
  <tag2>This is tag 2</tag2>
  <tag3>This is tag 3</tag3>
  this is the body
</tag1>
mganss commented 8 months ago

HtmlSanitizer is only intended to sanitize HTML. When a fragment is passed to the Sanitize() method, it is wrapped in a body before it is parsed by AngleSharp's HTML parser. The additional body tag in the fragment is then dropped by the parser. I currently don't see a way around this. https://github.com/mganss/HtmlSanitizer/blob/28bdf0e0a1a143735a6be7858a38eaea772fcfef/src/HtmlSanitizer/HtmlSanitizer.cs#L386 You can try and experiment with the SanitizeDom() overload that takes an IHtmlDocument. You'd need to coerce AngleSharp into keeping the body element somehow.

In theory, you could also work with the AngleSharp.Xml package but the problem is that HtmlSanitizer makes extensive use of AngleSharp's IHtmlDocument interface so it would probably be hard to add support for XML.

I'm interested to hear what your use case is. Where's the XSS vector in your scenario?