Closed Davidsv closed 11 months ago
The <link>
element in HTML does not support any content apart from attributes and therefore also does not feature an end tag. (specification). And HAP - being a HTML parser - tries to parse it as a regular HTML <link>
element. So, that's why you get what you see...
Looking a bit around in HAP's source code, there seems to be a way to achieve what you want. The HtmlAgilityPack.HtmlNode
class maintains a static dictionary HtmlNode.ElementsFlags
that assigns certain element characteristics to certain element names. For the link
element name, the dictionary characterizes it to be an empty element.
Since HtmlNode.ElementsFlags
is publicly accessible, it is sufficient to remove the entry for link
from this dictionary to get the desired result:
HtmlNode.ElementsFlags.Remove("link");
var html = @"<root><link>foo</link><url>bar</url></root>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
...
Note that due to HtmlNode.ElementsFlags
being a static field, modifying or replacing its assigned dictionary will affect all parsing done by HAP in your application.
(P.S.: I am just a user of HAP and not associated with the project nor its authors/maintainers.)
Thank you @elgonzo for your help again. Your answer is 100% correct.
Let us know if you have additional question about this @Davidsv
Best Regards,
Jon
Perfect, this is good enough for me. Thank you both. Closing
Description
If I pass a raw string that contains
<link>foo</link>
to htmlDocument.LoadHtml(raw), then outputhtmlDocument.DocumentNode.OuterHtml
, it will show up as<link>foo
(without the closing tag).And similarly, if I configure htmlDocument.OptionWriteEmptyNodes = true; , the output will be
<link />foo
, perhaps indicating that it think it's an empty node?Note: my input is not strictly expected to be a web page, I know
<link>
might have special meaning. But I'd still like to be able to load it as a regular node.Fiddle
https://dotnetfiddle.net/QASHg5