Open florianculie opened 12 months ago
I can't reproduce the issue. Can you try and add a code snippet here that shows the issue?
This is what I tried:
var html = @"<p style=""text-align: start;"">this is the content of the p tag</p>";
var sanitizer = new HtmlSanitizer();
sanitizer.AllowedCssProperties.Remove("text-align");
sanitizer.AllowedAttributes.Remove("style");
sanitizer.RemovingAttribute += (s, e) =>
{
Assert.Equal("text-align: start;", e.Tag.Attributes[0].Value);
};
var output = sanitizer.Sanitize(html);
Here's the Program.cs content from a new .NET 6 Console App :
using Ganss.Xss;
namespace ConsoleApp1
{
internal class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello, World!");
var html = @"<p style=""text-align: start;"">this is the content of the p tag</p>";
var sanitizer = new HtmlSanitizer();
//If you comment the two following lines, you will reproduce
sanitizer.AllowedCssProperties.Remove("text-align");
sanitizer.AllowedAttributes.Remove("style");
//
sanitizer.RemovingAttribute += (s, e) =>
{
string test = e.Tag.Attributes[0].Value;
};
var output = sanitizer.Sanitize(html);
}
}
}
If I use your code, with the two .Remove calls, I have no issue. However, if I comment those two lines (since I don't want to disallow them), you can see that the
e.Tag.Attributes[0].Value;
is an empty string, and the e.Tag.OuterHtml does not have the style value :
OK thanks, I can repro now. This occurs because the parsed style is empty in AngleSharp's CSSOM. Strangely, it does not occur if the style's value is text-align: left
for example. On MDN it says for the initial value of text-align
start, or a nameless value that acts as left if direction is ltr, right if direction is rtl if start is not supported by the browser.
Perhaps start
is not supported by AngleSharp and that's why it's not reflected in the OM? I'll open an issue in the AngleSharp.Css repo to find out what's going on.
I have reported as https://github.com/AngleSharp/AngleSharp.Css/issues/151
The start
value for text-align
is not yet supported by AngleSharp.Css.
For what it's worth, version 4.0.1830 uses AngleSharp 0.9.9.1. AngleSharp split the CSS processing to a separate library in 0.10.0, so I assume any version of HtmlSanitizer from v5 onwards (which uses AngleSharp 0.13.0) probably behaves differently with respect to CSS parsing.
Sorry for the delay in response. Thanks for your analysis. As it seems it's an invalid value, I think we will remove the style attribute from the problematic data for the project.
If I am correct with my understanding, we need to wait for Anglesharp to fix the issue your declared earlier, then migrate to the latest pre-release of HtmlSanitizer that uses this AngleSharp release. Is that it ? If so, do you happen to have a "know issues" with the newest version of AngleSharp ?
Thanks for your time on this subject.
@tiesont You're right. This scenario used to work before AngleSharp 0.10, see https://github.com/AngleSharp/AngleSharp.Css/issues/151#issuecomment-1823222961
@florianculie My hope is that the next release of AngleSharp.Css will be 1.0 which would mean you would no longer need a prerelease version of HtmlSanitizer. But in general, yes, you would need to wait for a new release of AngleSharp.Css.
I'm not aware of any issues with the newest versions of AngleSharp and AngleSharp.Css. This issue is the only regression I'm aware of.
Hi,
We are using the HtmlSanitizer 8.0.746 release to filter user input. To be precise, we do not sanitize it, we forbid the user to write the data if HtmlSanitizer triggers a RemovingN event.
We are having problems regarding the following HTML string :
We have a custom RemovingAttributeEvent that we are adding like this :
The code of our custom event is the following :
To put it simply, base on the logic I described at the begining, we cancel the RemovingAttribute Event if the reason for the trigger is considered safe for us, otherwise we confirm the trigger, and we add the reason to the _error object to log all reasons.
The problem we are facing is the fact that the event is being triggered with an "incomplete" html, as you can see below : Which gives us an args.Attribute.Name = "style" but an empty value for the actual style.
During the investigation of the issue, I did notice we were using the 6.0 version of your package, and by updating to 8.0, I still reproduce it. On a side note, we have been using your package for 6 year now, and we do not reproduce this issue with the 4.0.183 version, the version being used in an old version of our software.
Do you happen to have any idea if this is a normal behaviour or a bug ?