qt4cg / qt4tests

QT4 tests
https://qt4cg.org/
3 stars 7 forks source link

test: Remove test-1366.xml (generateXml.xsl) and empty test-1204.html… #115

Closed Constantin1489 closed 5 months ago

Constantin1489 commented 7 months ago

… in results

Hi! Thank you for the nice "fn:parser-html" test methodology and test cases.

According to _readme.md, I think that the file called test-1366.xml which is generated by generateXml.xsl is unnecessary and is duplicated. I also removed the empty file called test-1204.html.

I skimmed the codes below. Sadly I don't know Java and XQuery, I just checked file extensions of outputs in codes.

Thank you for your work! Without your wisdom and those methods above, I would have been suffering with hand-crafting test cases and continually fixing bugs.


Just in case, you are interested in what I'm doing, I'm currently investigating the reducibility from HTML DOM to XDM in the Python libraries. (Since I'm not an expert in this field, I just built my argument before I noticed the mapping explanation in the function of the new 4.0 draft.)

ChristianGruen commented 5 months ago

Thanks for your feedback, I’ve merged your PR. My understanding is that the files in the results directory are mostly for review purposes, so it could be that the files are added again at a later stage.

Constantin1489 commented 5 months ago

Thank you! As a result of my research, at least, HTML 4.0 parser of libxml2, doesn't meet DOM.(https://gitlab.gnome.org/GNOME/libxml2/-/issues/716) which means there can be two html root element nodes for the document.

Should I add the test case?

The significance of this test is that it will at least show failures for the popular html parser in Python. It will also require other applications. It will also prevent the misconception that html parsing in Python is not possible with XPath 2 or higher.

Constantin1489 commented 5 months ago

For the claim of misconception, At least In my sense, "XPath 1.0 is possible for html" logically implies the html document is DOM similar structure. Therefore XPath2 or higher is possible for the html document. But the libxml2 html parser is "non verifying parser"(https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html).

michaelhkay commented 5 months ago

You're always very welcome to add new test cases.

(However, test cases that merely demonstrate problems with third-party products can be a bit difficult to manage. I don't know what other implementors do, but when we encounter a test that we can't pass because of bugs in other products, we just exclude the test.)

michaelhkay commented 5 months ago

"XPath 1.0 is possible for html" logically implies the html document is DOM similar structure.

Actually, it implies that the HTML document can be mapped to an instance of the XPath 1.0 data model. Which is not the same as DOM.