mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.51k stars 198 forks source link

What's the correct way to allow ld+json? #550

Closed TopCoder02 closed 3 weeks ago

TopCoder02 commented 3 weeks ago

I was wondering what is the correct way to allow and sanitize json-ld

I want to remove all javascript, but I also want to allow ld+json and make sure that json content is sanitized. How do I accomplish this?

<script type="application/ld+json">
{
  "@context": "https://json-ld.org/contexts/person.jsonld",
  "@id": "http://dbpedia.org/resource/John_Lennon",
  "name": "John Lennon",
  "born": "1940-10-09",
  "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"
}
</script>
mganss commented 3 weeks ago

This will keep script elements of type application/ld+json:

sanitizer.RemovingTag += (s, e) => e.Cancel = e.Tag is IHtmlScriptElement script 
    && script.Type == "application/ld+json";

What exactly do you mean by sanitizing the JSON? Can there be anything embedded in the JSON that might get executed as JavaScript or where is the XSS potential here?

TopCoder02 commented 3 weeks ago

Sure, here's an example I put together: https://jsfiddle.net/30ctsyrf/1/

You will see it pop up foo and foo1

If I used the example you provided, it wouldn't remove the vulnerability. I think the content of the json need to be scrubbed.

<script type="application/ld+json">
{
  "@context": "http://json-ld.org/contexts/person.jsonld",
  "@id": "http://dbpedia.org/resource/John_Lennon",
  "name": "</script><script>alert('foo');</script><script>",
  "</script><script>alert('foo1');</script><script>": "1940-10-09",
  "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"
}
</script>
mganss commented 3 weeks ago

I'm getting the following output:

<script type="application/ld+json">
{
  "@context": "http://json-ld.org/contexts/person.jsonld",
  "@id": "http://dbpedia.org/resource/John_Lennon",
  "name": "</script>

Embedded HTML inside the JSON needs to be escaped.