zzzprojects / html-agility-pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
https://html-agility-pack.net
MIT License
2.65k stars 375 forks source link

parse links/url encode error #488

Closed andrey-q closed 2 years ago

andrey-q commented 2 years ago

Here is what to include in your request to make sure we implement a solution as quickly as possible.

1. Description

Parsing links were encoded incorrectly. Code sample:

textBoxURL.Text = "https://uk.wiktionary.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D1%96%D1%8F:%D0%A3%D0%BA%D1%80%D0%B0%D1%97%D0%BD%D1%81%D1%8C%D0%BA%D1%96_%D1%96%D0%BC%D0%B5%D0%BD%D0%BD%D0%B8%D0%BA%D0%B8";
textBoxNav.Text = "//table//tr[2]//a";
var htmlDoc = web.Load(textBoxURL.Text);
var nodesNav = htmlDoc.DocumentNode.SelectNodes(textBoxNav.Text);
foreach (var node in nodesNav) {
    string url = node.GetAttributeValue("href", "default");
    string text = node.InnerText;
}

urls were received in a loop: 1. https://uk.wiktionary.org/w/index.php?title=%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D1%96%D1%8F:%D0%A3%D0%BA%D1%80%D0%B0%D1%97%D0%BD%D1%81%D1%8C%D0%BA%D1%96_%D1%96%D0%BC%D0%B5%D0%BD%D0%BD%D0%B8%D0%BA%D0%B8&from=А

But correct is https://uk.wiktionary.org/w/index.php?title=%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D1%96%D1%8F:%D0%A3%D0%BA%D1%80%D0%B0%D1%97%D0%BD%D1%81%D1%8C%D0%BA%D1%96_%D1%96%D0%BC%D0%B5%D0%BD%D0%BD%D0%B8%D0%BA%D0%B8&from=%D0%90

2. https://uk.wiktionary.org/w/index.php?title=%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D1%96%D1%8F:%D0%A3%D0%BA%D1%80%D0%B0%D1%97%D0%BD%D1%81%D1%8C%D0%BA%D1%96_%D1%96%D0%BC%D0%B5%D0%BD%D0%BD%D0%B8%D0%BA%D0%B8&from=Б

But correct is https://uk.wiktionary.org/w/index.php?title=%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D1%96%D1%8F:%D0%A3%D0%BA%D1%80%D0%B0%D1%97%D0%BD%D1%81%D1%8C%D0%BA%D1%96_%D1%96%D0%BC%D0%B5%D0%BD%D0%BD%D0%B8%D0%BA%D0%B8&from=%D0%91

etc.

2. Exception

If you are seeing an exception, include the full exception details (message and stack trace).

Exception message:
Stack trace:

3. Fiddle or Project

If you are able,

Provide a Fiddle that reproduce the issue: https://dotnetfiddle.net/25Vjsn

Or provide a project/solution that we can run to reproduce the issue.

Otherwise, make sure to include as much information as possible to help our team to reproduce the issue.

4. Any further technical details

Add any relevant detail can help us, such as:

andrey-q commented 2 years ago

ok yes I got it i should use function WebUtility.HtmlDecode(html);