mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.51k stars 198 forks source link

HtmlSanitization removes Allowed Attributes from HTML content within a JSON string #551

Closed atamir93 closed 4 days ago

atamir93 commented 4 days ago

Hi,

I am using the HtmlSanitization library in my project. I have a property jsonColumn (which type is JSON) with the following value:

{
  "jsonColumn": "{\"content\":\"<a href=\\\"https://www.siemens.com/\\\" class=\\\"footer-link\\\">siemens.com</a> Global Website Intern &copy; Siemens AG, 2024\"}"
}

And after sanitizing I'm getting this result (even when I have "class" as an allowed attribute):

{
  "jsonColumn": "{\"content\":\"<a>siemens.com</a> Global Website Intern © Siemens AG, 2024\"}"
}

But my goal is to maintain the original format (without losing "href" and "class"). It works if I use single quotes, but in this case I'm getting exception from sql because it cannot parse this to JSON:

{
  "jsonColumn": "{\"content\":\"<a href='https://www.siemens.com/' class='footer-link'>siemens.com</a> Global Website Intern © Siemens AG, 2024\"}"
}

I would appreciate any suggestions on how to resolve this issue. Thanks.

mganss commented 4 days ago

HtmlSanitizer sanitizes only HTML, not JSON. I suggest deserializing the JSON, then sanitize the value of the content property and possibly serialize back to JSON or use in another way according to your use case.