mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.55k stars 200 forks source link

Html sanitizer removes attributes which have \\\" #440

Closed bjose7 closed 1 year ago

bjose7 commented 1 year ago

{"config":{"content": "<div style=\"left: 0px; width: 100%; height: 0px; position: relative; padding-bottom: 56.25%; max-width: 650px;\">text</div>"}};

The above is a sample payload i get in a request body. The first thing that happens when this payload reaches the api service (azure function), we use streamreader to read and set it to string, but escape characters get added to it as shown below

{\"config\":{\"content\": \"<div style=\\\"left: 0px; width: 100%; height: 0px; position: relative; padding-bottom: 56.25%; max-width: 650px;\\\">text</div>\"}};

Now when i pass this above string to the sanitizer, it removes the whole style attribute. How do we handle this ? Or like what's the right way to sanitize this string ? I tried replacing the \\" with single quotes, which worked, but it replaces the single quotes with \", and now all quotes are at same level and it breaks the FE. Cause when you render back, the escapes are removed, but the once inside the html tags also get removed.

var htmlSan = new HtmlSanitizer(); Console.WriteLine(htmlSan.Sanitize(input));

Whats the right way to handle this ?

mganss commented 1 year ago

The whole string looks like JSON. First, deserialize the JSON string, then sanitize only the "content" property. Is the format of the payload always the same?

bjose7 commented 1 year ago

thats the problem, this payload can have any structure. So i cannot deserialize to a standard model, right ?

mganss commented 1 year ago

Yes, you can use the JSON DOM facilities in System.Text.Json for example.

mganss commented 1 year ago

You can just do completeNode["something"][0]["config"]["content"] = sanitizedContent;

bjose7 commented 1 year ago

yep.. thanks. Thats what i ended up with.