zzzprojects / html-agility-pack

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files.
https://html-agility-pack.net
MIT License
2.65k stars 375 forks source link

Double <p> <p> open tags leave one <p> open even with option setted #538

Closed Glorfindel88 closed 9 months ago

Glorfindel88 commented 9 months ago

When trying to parse an html obtained from an editor, a double<p><p> open tag does not get closed. Only one of the two is closed, <p><p> becomes <p><p></p> even when options OptionCheckSyntax, OptionFixNestedTagsand OptionWriteEmptyNodesare set to true.

The only way both tags are closed is to set OptionOutputAsXmlto true.

Here is a fiddle of what happens:

https://dotnetfiddle.net/pi92VJ

HAP version 1.11.58 .NET Framework 4.8

JonathanMagnan commented 9 months ago

Hello @Glorfindel88 ,

That is an expected behavior.

A p tag can be closed implicitly when followed by another p tag: https://www.w3.org/MarkUp/HTMLPlus/htmlplus_11.html#:~:text=The%20P%20element%20acts%20as%20a%20container%20for%20the%20text,tag%20as%20a%20paragraph%20separator.

See this line in the example on the page:

<P>The first piece of text<P>The second piece

Let me know if that answer correctly to your question.

Best Regards,

Jon

Glorfindel88 commented 9 months ago

Hello @JonathanMagnan , thank you for your answer. I know the tags can work that way and that works as far as the browser is concerned. But i did not think the behavior was expected, given the options. And a previous version of HAP we used, can't remember which one, did close the double tag. And we need to close all tags for other operations we then do on the html. Is there any option setting that allows to close everything, or only OptionOutputAsXml?

Thanx,

Best Regards.

JonathanMagnan commented 9 months ago

Hello @Glorfindel88 ,

There is currently no option, but we will look to see if we can add one. Everything looks to be already coded for this, so it should be very easy.

Best Regards,

Jon

Glorfindel88 commented 9 months ago

Thank you very much.

Best Regards

JonathanMagnan commented 9 months ago

Hello @Glorfindel88 ,

The v1.11.59 has been released.

In this version, we added the option DisableImplicitEnd that you can now set to true to have the expected behavior.

Let me know if everything is now working as expected on your side.

Best Regards,

Jon

Glorfindel88 commented 9 months ago

Hello @JonathanMagnan This works wonderfully, thank you very much.

Best Regards.