Closed Constantin1489 closed 5 months ago
Thanks for your feedback, I’ve merged your PR. My understanding is that the files in the results directory are mostly for review purposes, so it could be that the files are added again at a later stage.
Thank you! As a result of my research, at least, HTML 4.0 parser of libxml2
, doesn't meet DOM.(https://gitlab.gnome.org/GNOME/libxml2/-/issues/716) which means there can be two html root element nodes for the document.
Should I add the test case?
The significance of this test is that it will at least show failures for the popular html parser in Python. It will also require other applications. It will also prevent the misconception that html parsing in Python is not possible with XPath 2 or higher.
For the claim of misconception, At least In my sense, "XPath 1.0 is possible for html" logically implies the html document is DOM similar structure. Therefore XPath2 or higher is possible for the html document. But the libxml2 html parser is "non verifying parser"(https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html).
You're always very welcome to add new test cases.
(However, test cases that merely demonstrate problems with third-party products can be a bit difficult to manage. I don't know what other implementors do, but when we encounter a test that we can't pass because of bugs in other products, we just exclude the test.)
"XPath 1.0 is possible for html" logically implies the html document is DOM similar structure.
Actually, it implies that the HTML document can be mapped to an instance of the XPath 1.0 data model. Which is not the same as DOM.
… in results
Hi! Thank you for the nice "fn:parser-html" test methodology and test cases.
According to _readme.md, I think that the file called
test-1366.xml
which is generated bygenerateXml.xsl
is unnecessary and is duplicated. I also removed the empty file calledtest-1204.html
.I skimmed the codes below. Sadly I don't know Java and XQuery, I just checked file extensions of outputs in codes.
Thank you for your work! Without your wisdom and those methods above, I would have been suffering with hand-crafting test cases and continually fixing bugs.
Just in case, you are interested in what I'm doing, I'm currently investigating the reducibility from HTML DOM to XDM in the Python libraries. (Since I'm not an expert in this field, I just built my argument before I noticed the mapping explanation in the function of the new 4.0 draft.)