Closed greg84 closed 10 years ago
I'm starting to wonder whether I should be using CsQuery for parsing XML! Don't think it was designed to do that. Would be nice if I could get it to work though.
I've implemented a "dirty hack", which replaces tags that don't allow content in HTML with tags that do before I create the CQ
instance. So for example, <link>
becomes <div data-xml="link">
- it works but it's a bit filthy.
True - it is not really designed for parsing XML. The real outcome here is driven by HtmlParserSharp (otherwise known as the validator.nu HTML parser) which is a true HTML parser and follows the spec rules.
There are a couple other threads in the issues here about HTML parsing. There are some things that could be done to make it work right/better but I haven't had time to look into it.
I'm trying to load an XML document for parsing with CsQuery. Part of it has the following markup:
The
<link>
tag has no content in HTML5 (it's self-closing) so I want this document to be parsed as XML where I assume the following should work to get the text between the tags....but this returns an empty string.
I load the document using this method (this is the actual URL I'm loading so you can see the XML document I'm working with):
After the document has been loaded the value of
cq.Document.DocType
isHTML5
, shouldn't it be something else because it's an XML response? Is it an issue with CsQuery or the web site? I've read the page about character encoding but can't see why this isn't working.