zzzprojects / html-agility-pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
https://html-agility-pack.net
MIT License
2.63k stars 375 forks source link

System.NullReferenceException #98

Closed udayashree closed 6 years ago

udayashree commented 6 years ago

Hi,

I'm getting the below error often in my program but when I restart my program it works fine again after some time I'm getting null reference exception, though the node is present in the web site it showing me as null as below

System.NullReferenceException: 'Object reference not set to an instance of an object.'

HtmlAgilityPack.HtmlNode.SelectSingleNode(...) returned null.

Below is line throwing error HtmlNode node = document.DocumentNode.SelectSingleNode("//table[@class='tableBorder trBgBlue tdAlignC font13 fontStyle']");

This works fine but sometimes I'm getting null exception again when I re-run my program it works fine with no exception

JonathanMagnan commented 6 years ago

Hello @udayashree ,

Unfortunately, this issue is missing too much information to reproduce it or find the error.

I believe the issue is more in your code otherwise with the number of people using this library, it would have been reported way before. But we never know!

If you can easily reproduce it, I recommend you to add this two null check.

if(document == null)
{
    throw new Exception("Oops! Document is null... must find out why!)
}
else if(document.DocumentNode == null)
{
    throw new Exception("Oops! DocumentNode is null... must find out why!)
}

So if one of this error are throw, it could maybe help you to find out where the issue come.

I believe for an unknown reason, your document becomes null somewhere in your code.

Let me know if you find out why this is happening.

Best Regards,

Jonathan

JonathanMagnan commented 6 years ago

Hello @udayashree ,

Do you think you could provide us additional information?


Help us to keep this library free: Donate

JonathanMagnan commented 6 years ago

Hello @udayashree ,

We are currently making some issue cleanup by closing some unanswered question.

Feel free to reopen if you can provide us additional information.

Best Regards,

Jonathan


Help us to keep this library free: Donate

JonathanMagnan commented 6 years ago

Hello @yespetruk ,

Thank you, we will try your scenario on a few Amazon product page to see if we can reproduce it.

Best Regards,

Jonathan


Help us to keep this library free: Donate

JonathanMagnan commented 6 years ago

Hello @yespetruk ,

We started to investigate this issue but we found out that Amazon sometime block your request with the following message in the HTML:

To discuss automated access to Amazon data please contact api-services-support@amazon.com. For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.

I believe your issue might be only this which our library cannot do anything against.

If you believe you can reproduce this issue on a website without being blocked, let us know and we will investigate it more.

Best Regards,

Jonathan


Help us to keep this library free: Donate

udayashree commented 6 years ago

Hi Jonathan,

Meanwhile, can you help in figuring out whether the below website also blocks the request? http://racing.hkjc.com/racing/Info/meeting/Results/english/Local/20171227/HV/1

Below is my code string url ="http://racing.hkjc.com/racing/Info/meeting/Results/english/Local/20171227/HV/1" var htmlDoc = new HtmlDocument(); htmlDoc.OptionReadEncoding = false; var request = (HttpWebRequest)WebRequest.Create(url); request.Method = "GET"; using (var response = (HttpWebResponse)request.GetResponse()) { using (var stream = response.GetResponseStream()) { htmlDoc.Load(stream, Encoding.UTF8); } }

        HtmlNode tableNode = htmlDoc.DocumentNode.SelectSingleNode("//table[contains(@class,'tdAlignC')])[2]");

The website is loading correctly, but sometimes Im getting null value in this line HtmlNode tableNode = htmlDoc.DocumentNode.SelectSingleNode("(//table[contains(@class,'tdAlignC')])[2]");

JonathanMagnan commented 6 years ago

Hello @udayashree ,

I don't think that's related to our library. If you are getting sometimes null value, make sure to log the HTML somewhere and be able to check yourself whether the current URL is blocked or not.

Unfortunately, we cannot check if every request is blocked or not ;(

Best Regards,

Jonathan

udayashree commented 6 years ago

Hi Jonathan,

I have attached the screenshot of the table node alone showing null for your reference. node

I'm getting a depth level exception message. And html is loaded correctly

JonathanMagnan commented 6 years ago

Hello @udayashree ,

Thank you for this additional information,

We will look about this exception.

Best Regards,

Jonathan

JonathanMagnan commented 6 years ago

Hello @udayashree ,

We made some test and everything is fine every time.

We even made some loop using your code

HtmlNode tableNode = htmlDoc.DocumentNode.SelectSingleNode("(//table[contains(@class,'tdAlignC')])[2]");

if (tableNode == null)
{
    throw new Exception("Oops! table is null");
}

but unfortunately we are not able to reproduce it ;(

Are you getting an error on this page (http://racing.hkjc.com/racing/Info/meeting/Results/english/Local/20171227/HV/1)? Or that's on some page that you try to get value from?

Best Regards,

Jonathan