mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.55k stars 200 forks source link

invalid characters inserted #411

Closed NGemballa closed 1 year ago

NGemballa commented 1 year ago

When sanitizing the html in the file below, the sanitizer inserts a special charater in the style information dirtyhtml.txt

The character which is inserted is xFFFF, which cause an exception, when the result is put to a Xml serializer. image sanitiedhtml.txt

mganss commented 1 year ago

I can't repro. I have a feeling this is an encoding issue. How do you read the file into a string before sanitizing?

NGemballa commented 1 year ago

Hi Michael!

Thanks for the fast reply!

No, it's not an encoding issue. I wanted to provide a demo and found out, it occurs when you when you add "data" to the AllowedSchemes property.

mganss commented 1 year ago

I still can't repro. Can you provide a snippet of code that shows the issue?

NGemballa commented 1 year ago

Sure, it's based on .NET Framework 4.8 Program.txt

mganss commented 1 year ago

Still can't repro 🤷🏻‍♂️ Made a fresh console app and renamed the Program.txt to Program.cs. Had to rename the namespace to Ganss.Xss to accomodate the latest version of HtmlSanitizer.

NGemballa commented 1 year ago

Sorry my fault. I missed to check the HtmlSanitizer version in my test project. After updating to the latest, it works with the code attached earlier. But still I've got the issue the original code. I attached a demo project incl. the source html (strip to avoid sharing personal data). In the sanitized html the special character is added image Hope that helps to reproduce the issue. HtmlSanitizerTest.zip

mganss commented 1 year ago

This occurs due to a CSS rendering issue inside AngleSharp.Css reported here: https://github.com/AngleSharp/AngleSharp.Css/issues/123

The " inside the style attribute are unbalanced which is what may be triggering the issue. Perhaps you can work around the issue by fixing this in the original source.

NGemballa commented 1 year ago

Yes, we already did a workaround.

Thanks for analyzing and reporting!

mganss commented 1 year ago

This has been fixed in 8.0.691-beta. In addition to the bug in AngleSharp.Css there was a bug in HtmlSanitizer that prevented this use case from working. This bug has been fixed in 8.0.692 as well but note that this use case won't work in 8.0.692 due to the bug in AngleSharp.Css 0.17.0.