zzzprojects / html-agility-pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
https://html-agility-pack.net
MIT License
2.65k stars 375 forks source link

Null reference exception generating HAP #475

Open ivanlabsii opened 2 years ago

ivanlabsii commented 2 years ago

Here is what to include in your request to make sure we implement a solution as quickly as possible.

1. Description

I have received this exception from the AppCenter so it is generated on the user device, as such I cannot provide exact way to reproduce it, but I think that the stacktrace should be quite telling.

2. Exception

If you are seeing an exception, include the full exception details (message and stack trace).

Exception message:
Dictionary`2[TKey,TValue].FindEntry (TKey key)
System.NullReferenceException: Object reference not set to an instance of an object.
Stack trace:
System.Collections.Generic
Dictionary`2[TKey,TValue].FindEntry (TKey key)
System.Collections.Generic
Dictionary`2[TKey,TValue].TryGetValue (TKey key, TValue& value)
System.Text
Encoding.GetEncoding (System.Int32 codepage)
System.Text
Encoding.GetEncoding (System.String name)
HtmlAgilityPack
HtmlDocument.ReadDocumentEncoding (HtmlAgilityPack.HtmlNode node)
HtmlAgilityPack
HtmlDocument.PushNodeEnd (System.Int32 index, System.Boolean close)
HtmlAgilityPack
HtmlDocument.Parse ()
HtmlAgilityPack
HtmlDocument.Load (System.IO.TextReader reader)
HtmlAgilityPack
HtmlDocument.LoadHtml (System.String html)

3. Fiddle or Project

If you are able,

Provide a Fiddle that reproduce the issue: https://dotnetfiddle.net/25Vjsn

Or provide a project/solution that we can run to reproduce the issue.

Otherwise, make sure to include as much information as possible to help our team to reproduce the issue.

4. Any further technical details

Add any relevant detail can help us, such as:

elgonzo commented 2 years ago

This looks basically like a bug in whatever Xamarin.IOS version the device was using.

Note how HAP invokes Encoding.GetEncoding(string) (not a HAP method, but a method being part of .NET's framework classes) and then the NRE originating deep within the invocation chain of this Encoding.GetEncoding method.

Encoding.GetEncoding(string) should never throw a NRE -- it is a violation of its API specification. It must only throw an ArgumentException (or any derived exception type, such as ArgumentNullException) when an encoding is requested that the runtime or the underlying platform does not support.


However, i believe that HAP could relatively easily work around this issue. HAP already handles ArgumentExceptions thrown by Encoding.GetEncoding(string), and in my entirely subjective opinion the respective try-catch clause in HtmlDocument.ReadDocumentEncoding could be "broaden" to catch any exception (not just ArgumentException), thus basically becoming able to work around Encoding.GetEncoding throwing an unexpected exception type.

(As a side note, in my opinion, the HtmlDocument.ReadDocumentEncoding method needs to be reviewed by the team anyway. While looking at it for understanding your issue, i noticed that for certain build targets, it does if (_declaredencoding.WebName != _streamencoding.WebName) without checking whether _declaredencoding is being null or not despite _declaredencoding possibly being null at that point. This is an issue -- albeit unrelated to the one reported above -- that should be addressed at an appropriate time.)