mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.52k stars 198 forks source link

StackOverflowException on trying to sanitize #417

Closed joffremota closed 11 months ago

joffremota commented 1 year ago

I'm using HtmlSanitizer to sanitize HTML origined by EML files. I've got a StackOverflowException exception and I'd like to know if there is something I can do to solve it. While debugging my web application, this is what I receive. While deployed, IIS is crashing.

Screenshot_1

I'm using the latest HtmlSanitizer version and appreciatte any help.

tiesont commented 1 year ago

A minimal code example that reproduces your exception would be really helpful, along with a few more details:

joffremota commented 1 year ago

Hello @tiesont . Thanks for your response.

My HTML has lots of divs (actually, more than 5000 fragments). File has only 350kb and my code is below, where "htmlBody" attributte receives the HTML I'm having trouble with.

public string Sanitize()
{
    var htmlBody = emailBody;

    if (sanitizerLogic == EmailSanitizerLogic.RemoveAndKeepContent)
    {
        sanitizer.KeepChildNodes = true;
    }

    AllowBase64Images();
    AllowCidImages();

    foreach (var newTag in additionalTags)
    {
        if (string.IsNullOrWhiteSpace(newTag)) continue;
        sanitizer.AllowedTags.Add(newTag.Trim());
    }

    foreach (var newAttr in additionalAttributes)
    {
        if (string.IsNullOrWhiteSpace(newAttr)) continue;
        sanitizer.AllowedAttributes.Add(newAttr.Trim());
    }

    var sanitizedHtml = sanitizer.Sanitize(htmlBody, "");

    return sanitizedHtml;
}

private void AllowBase64Images()
{
    sanitizer.AllowDataAttributes = true;
    sanitizer.AllowedSchemes.Add("data");
    sanitizer.RemovingAttribute += (s, e) =>
    {
        var isNotAllowed = e.Reason == RemoveReason.NotAllowedAttribute || e.Reason == RemoveReason.NotAllowedUrlValue;
        var hasLength = e.Attribute.Value.Length >= 0xfff0;
        var startWithData = e.Attribute.Value.StartsWith("data:", StringComparison.OrdinalIgnoreCase);
        e.Cancel = isNotAllowed && hasLength && startWithData;
    };
}

private void AllowCidImages()
{
    sanitizer.AllowDataAttributes = true;
    sanitizer.AllowedSchemes.Add("cid");
    sanitizer.RemovingAttribute += (s, e) =>
    {
        var isNotAllowed = e.Reason == RemoveReason.NotAllowedAttribute || e.Reason == RemoveReason.NotAllowedUrlValue;
        var hasLength = e.Attribute.Value.Length >= 0xfff0;
        var startWithData = e.Attribute.Value.StartsWith("cid:", StringComparison.OrdinalIgnoreCase);
        e.Cancel = isNotAllowed && hasLength && startWithData;
    };
}

If I may, I'll post an external link with the HTML file. Find below a print of a part of it.

Screenshot_1

mganss commented 1 year ago

@joffremota Yes, please attach the HTML here.

joffremota commented 1 year ago

Here it is, @mganss

file.zip

mganss commented 1 year ago

Thanks. I can't repro, though. What's in additionalTags and additionalAttributes?

joffremota commented 1 year ago

@mganss, additionalAttributes has only one item: class, while additionalTags has the following items:

The error occurs only inside IIS, that crashes.

Screenshot_1

When I set the HTML on a string and create a unit test with it, I can't reproduce either.

mganss commented 1 year ago

I have a feeling this occurs because the maximum stack size in IIS is smaller than in a console app. Perhaps you can create a new thread and set the stack size to a higher value (perhaps try 1MB or 4MB?) and run the sanitizer on that thread.