zzzprojects / html-agility-pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
https://html-agility-pack.net
MIT License
2.65k stars 375 forks source link

Stackoverflow when using ReplaceChild #513

Closed enkelmedia closed 1 year ago

enkelmedia commented 1 year ago

I just upgraded the dependency on html-agility-pack from an older version to 1.11.52 and suddenly started to get Stack overflow exceptions.

1. Description

The problem happens in a couple of areas that I have identified.

Using ReplaceChild (when replaced node has already been added to something else)

wrapperDivNode.ChildNodes.Add(nestedTable);
node.ParentNode.ReplaceChild(wrapperDivNode, nestedTable);

(details below)

and:

node.Attributes.RemoveAll();

The workaround here was to just iterate a new list and remove:

 foreach (var attr in node.Attributes.ToList())
 {
     node.Attributes.Remove(attr);
 }

There is a sample of the problem:

var tables = doc.DocumentNode.SelectNodes("//table");

if (tables != null && tables.Any())
{

    foreach (var table in tables)
    {
        var nestedTable = table;

        if (nestedTable != null)
        {
            // Wrap the table in a div "table-wrapper" so that we can apply "mobile table"-css.
            var wrapperDivNode = HtmlNode.CreateNode("<div class=\"table-wrapper\"></div>");

            wrapperDivNode.ChildNodes.Add(nestedTable);

            nestedTable.ParentNode.ReplaceChild(wrapperDivNode, nestedTable);

        }

    }
}

2. Exception

Could not get a good stack trace because of the stack overflow.

3. Fiddle or Project

Error: https://dotnetfiddle.net/YJJ2jq

Moving the "ReplaceChild" before the "ChildNodes.Add" makes it work: https://dotnetfiddle.net/sJCUmN

This used to work in older versions and it might be fair that it does not work but there should be a better exception and not a stack overflow.

Works:

4. Any further technical details

Add any relevant detail can help us, such as:

JonathanMagnan commented 1 year ago

Hello @enkelmedia ,

Thank you for the project. My developer will look at it.

Best Regards,

Jon


Sponsorship Help us improve this library

Performance Libraries context.BulkInsert(list, options => options.BatchSize = 1000); Entity Framework ExtensionsDapper Plus

Runtime Evaluation Eval.Execute("x + y", new {x = 1, y = 2}); // return 3 C# Eval Function

JonathanMagnan commented 1 year ago

Hello @enkelmedia ,

A custom exception pointing to this issue has been added in the v1.11.53

The issue was caused because the ChildNodes of a Child contained the same list of ChildNodes as his parent, which led to a stack overflow exception. See this part: https://github.com/zzzprojects/html-agility-pack/blob/7c35d8f8f698a76a57dafe22d3d23b65b3815cd3/src/HtmlAgilityPack.Shared/HtmlNode.cs#L1003-L1016

Another way to make it work is to clone the current node that you want to replace and use it:

if (tables != null && tables.Any())
{
    foreach (var table in tables)
    {
        var nestedTable = table.Clone();

        // Wrap the table in a div "table-wrapper" so that we can apply "mobile table"-css.
        var wrapperDivNode = HtmlNode.CreateNode("<div class=\"table-wrapper\"></div>");

        wrapperDivNode.ChildNodes.Add(nestedTable);

        table.ParentNode.ReplaceChild(wrapperDivNode, table);
    }
}

Let me know if everything is fine for you or there is something else related to this issue.

Best Regards,

Jon


Are you finding this library useful? If so, please consider supporting its continued development by becoming a sponsor.

Additionally, if you think your enterprise would benefit from this library, we encourage you to suggest they become a sponsor as well. Your support will help ensure this library remains alive and well-supported.