zzzprojects / html-agility-pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
https://html-agility-pack.net
MIT License
2.63k stars 375 forks source link

Can't find a node used HtmlAgilityPack 1.11.57 #535

Closed ovolkov closed 7 months ago

ovolkov commented 8 months ago

1. Description

I found html file with a comment "Use 2 alternating methods of displaying tax due to confuse screen scrapers". I need to pull out the 'Assessed Tax:' value from there. I used two different versions of XPATH but they don't work: //ul[@id='cphMainContent_cphRightColumn_taxDue2']/li[2] //*[contains(text(),'Assessed Tax:')]/ancestor::li[1]/following-sibling::li[1]

`

                    <ul id="cphMainContent_cphRightColumn_taxDue2" class="block-grid six-up mobile-four-up">
                        <li>
                            <label>
                                <section><div><div><section><span><div>
                                Assessed Tax:
                            </div></span></section></div></div></section>
                            </label>
                        </li>
                        <li>
                            <span><div><span><div><span><section><span>

                            $1,249.24

                            </span></section></span></div></span></div></span>
                        </li>
                        <li>
                            <div><section><span><div><div><span><div><section>
                            <label>Tax Paid:</label>
                            </section></div></span></div></div></span></section></div>
                        </li>
                        <li>
                            <span><span><span>
                            $0.00
                            </span></span></span>
                        </li>
                        <li>
                            <div><div><section><section><section><span><div><div><section>
                            <label>Total Due:</label>
                            </section></div></div></span></section></section></section></div></div>
                        </li>
                        <li>

                            <a href='TaxDue.aspx?TaxYear=2023' class='text-blue'
                                title="Click for a Tax Stub printout" data-toggle="wait">
                                $1,274.22
                            </a>

                        </li>
                    </ul>
                    </div></div></div></section></span></section></span></div>`

2. Exception

3. Fiddle or Project

Tester.zip `using HtmlAgilityPack;

var filepath=Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "notfoundTpage.html"); var doc=new HtmlDocument(); doc.Load(filepath); var node=doc.DocumentNode.SelectSingleNode("//ul[@id='cphMainContent_cphRightColumn_taxDue2']/li[2]"); if (node == null) { Console.WriteLine("XPATH not found 1"); }

node = doc.DocumentNode.SelectSingleNode("//*[contains(text(),'Assessed Tax:')]/ancestor::li[1]/following-sibling::li[1]"); if (node == null) { Console.WriteLine("XPATH not found 2"); }`

4. Any further technical details

.NET 7.0 HtmlAgilityPack 1.11.57

JonathanMagnan commented 8 months ago

Hello @ovolkov ,

The changes we made in the v1.11.57 has been reverted yesterday.

I just upgraded your project to the v1.11.58 and everything work fine again.

Let me know if that's fixed for you.

Best Regards,

Jon

ovolkov commented 8 months ago

Thanks! Everything is working fine!

пн, 29 січ. 2024 р., 15:53 користувач Jonathan Magnan < @.***> пише:

Hello @ovolkov https://github.com/ovolkov ,

The changes we made in the v1.11.57 has been reverted yesterday.

I just upgraded your project to the v1.11.58 and everything work fine again.

Let me know if that's fixed for you.

Best Regards,

Jon

— Reply to this email directly, view it on GitHub https://github.com/zzzprojects/html-agility-pack/issues/535#issuecomment-1914740200, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC6BHF7JRLHHFBE6ULYWNADYQ6ST5AVCNFSM6AAAAABCNLWJHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJUG42DAMRQGA . You are receiving this because you were mentioned.Message ID: @.***>