Closed NGemballa closed 1 year ago
I can't repro. I have a feeling this is an encoding issue. How do you read the file into a string before sanitizing?
Hi Michael!
Thanks for the fast reply!
No, it's not an encoding issue. I wanted to provide a demo and found out, it occurs when you when you add "data" to the AllowedSchemes property.
I still can't repro. Can you provide a snippet of code that shows the issue?
Sure, it's based on .NET Framework 4.8 Program.txt
Still can't repro 🤷🏻♂️ Made a fresh console app and renamed the Program.txt to Program.cs. Had to rename the namespace to Ganss.Xss
to accomodate the latest version of HtmlSanitizer.
Sorry my fault. I missed to check the HtmlSanitizer version in my test project. After updating to the latest, it works with the code attached earlier. But still I've got the issue the original code. I attached a demo project incl. the source html (strip to avoid sharing personal data). In the sanitized html the special character is added Hope that helps to reproduce the issue. HtmlSanitizerTest.zip
This occurs due to a CSS rendering issue inside AngleSharp.Css reported here: https://github.com/AngleSharp/AngleSharp.Css/issues/123
The "
inside the style
attribute are unbalanced which is what may be triggering the issue. Perhaps you can work around the issue by fixing this in the original source.
Yes, we already did a workaround.
Thanks for analyzing and reporting!
This has been fixed in 8.0.691-beta. In addition to the bug in AngleSharp.Css there was a bug in HtmlSanitizer that prevented this use case from working. This bug has been fixed in 8.0.692 as well but note that this use case won't work in 8.0.692 due to the bug in AngleSharp.Css 0.17.0.
When sanitizing the html in the file below, the sanitizer inserts a special charater in the style information dirtyhtml.txt
The character which is inserted is xFFFF, which cause an exception, when the result is put to a Xml serializer. sanitiedhtml.txt