mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.51k stars 198 forks source link

RemovingTag and/or RemovingAttribute does not fire for "<BODY ONLOAD=alert('XSS')>" #546

Closed AnderssonPeter closed 1 month ago

AnderssonPeter commented 1 month ago

Hi when i run

var content = "<BODY ONLOAD=alert('XSS')>";
var sanitizer = new HtmlSanitizer();
sanitizer.AllowedTags.Clear();
sanitizer.AllowedAttributes.Clear();
var isValid = true;
sanitizer.RemovingTag += (_, _) =>
    isValid = false;
sanitizer.RemovingAttribute += (_, _) =>
    isValid = false;
var santatized = sanitizer.Sanitize(value);

isValid is still true but santatized is not the same as content

For context I'm just trying to check if the string contains any html with a few exceptions..

mganss commented 1 month ago

I think this is because the body tag gets discarded by the AngleSharp parser before HtmlSanitizer gets a chance to sanitize it. This in turn is because the Sanitize method is intended to sanitize HTML fragments (elements inside body). If you want to sanitize a whole HTML document you need to call SanitizeDocument.