zzzprojects / html-agility-pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
https://html-agility-pack.net
MIT License
2.65k stars 375 forks source link

How to bind a nested class property to the attribute of html element with GetEncapsulatedData Method? #505

Closed rwecho closed 1 year ago

rwecho commented 1 year ago

https://github.com/zzzprojects/html-agility-pack/blob/c12580b2025144d618d3397dec1fd81132e3f822/src/HtmlAgilityPack.Shared/HtmlNode.Encapsulator.cs#L298

In most situations, I like to use XPath attribute binding to a Property. But in the nested class, it can not access the parent node, the parent node's attribute often records some important information, like id.


   [HasXPath]
    class TestA
    {
        [XPath("a")]
        public List<Item> Items { get; set; }
        [HasXPath]
        public class Item
        {
            [XPath(".", "href")] //not working
            [SkipNodeNotFound]
            public string Href { get; set; }

            [XPath(".")]
            [SkipNodeNotFound]
            public string Name { get; set; }
        }

    }

    var html = @"
<div>
<div>Hello
<a href='1.html'>1.html</a>
<a href='2.html'>2.html</a>
</div>
<div>World</div>
</div>
";

     var document = new HtmlDocument();
     Document. LoadHtml(html);
     var testA = Document. DocumentNode.GetEncapsulatedData<TestA>();

In the HtmlNode.Encapsulator.cs#L298, Is using OuterHtml better than InnerHtml?

elgonzo commented 1 year ago

In the HtmlNode.Encapsulator.cs#L298, Is using OuterHtml better than InnerHtml?

The bug seems to be not adhering to the [XPath] attribute's NodeReturnType value here (the NodeReturnType value is supposed to govern whether InnerText, InnerHtml or OuterHtml is used for producing the result for a property.

HtmlNode.Encapsulator.cs#L298:

https://github.com/zzzprojects/html-agility-pack/blob/c12580b2025144d618d3397dec1fd81132e3f822/src/HtmlAgilityPack.Shared/HtmlNode.Encapsulator.cs#L298

isn't the only place exhibiting this issue, HtmlNode.Encapsulator.cs#L180 also forgets to take HasXPathAttribute.NodeReturnType into account: https://github.com/zzzprojects/html-agility-pack/blob/c12580b2025144d618d3397dec1fd81132e3f822/src/HtmlAgilityPack.Shared/HtmlNode.Encapsulator.cs#L180

After proper support for the HasXPathAttribute.NodeReturnType property has been implemented, then this should work:

[HasXPath]
class TestA
{
      [XPath("a", NodeReturnType = ReturnType.OuterHtml)]    // <--------- setting NodeReturnType here
      public List<Item> Items { get; set; }

      [HasXPath]
      public class Item
      {
          [XPath(".", "href")]
          [SkipNodeNotFound]
          public string Href { get; set; }

          [XPath(".")]
          [SkipNodeNotFound]
          public string Name { get; set; }
      }
}
elgonzo commented 1 year ago

Side note: There is a related issue that also warrants fixing when the proper NodeReturnType support is being implemented. Currently, when a property of a simple type like string has a [XPath] attribute with an invalid NodeReturnType value, a bare-bones and non-descript System.Exception is being thrown:

https://github.com/zzzprojects/html-agility-pack/blob/c12580b2025144d618d3397dec1fd81132e3f822/src/HtmlAgilityPack.Shared/HtmlNode.Encapsulator.cs#L629

This should to be changed into a meaningful exception with a message that informs about the concrete cause of the exception.

rwecho commented 1 year ago

@elgonzo I created a PR on your advice, please help me review it.

JonathanMagnan commented 1 year ago

Hello @rwecho ,

We will close this issue has your PR has been merged and now available in the v1.11.50

Best Regards,

Jon