validator / htmlparser

The Validator.nu HTML parser https://about.validator.nu/htmlparser/
Other
56 stars 26 forks source link

Ensure every Locator is also a Locator2 #10

Closed faceless2 closed 4 years ago

faceless2 commented 4 years ago

This pull request will ensure every org.xml.sax.Locator created by this package also implements org.xml.sax.ext.Locator2. This extended interface (added in Java 5) allows you to retrieve the encoding of the InputSource.

The encoding is important, as it's the default encoding used for CSS imports, and as the HtmlParser will sniff the encoding from an embedded <meta charset> element, it can't easily be retrieved any other way.

The PR casts Locator to Locator2 in many places, but as every class that implements Locator in this package also implements Locator2, this is safe.

sideshowbarker commented 4 years ago

Sorry for the delay in reviewing this. In my environment, when testing this with the HTML checker code, this change causes the HTML checker to fail at runtime with the following:

/Library/Java/JavaVirtualMachines/openjdk-13.jdk/Contents/Home/bin/java -jar ./build/dist/vnu.jar --format gnu ./minDoc.html
Exception in thread "main" java.lang.ExceptionInInitializerError
    at nu.validator.client.SimpleCommandLineValidator.setErrorHandler(SimpleCommandLineValidator.java:559)
    at nu.validator.client.SimpleCommandLineValidator.setup(SimpleCommandLineValidator.java:337)
    at nu.validator.client.SimpleCommandLineValidator.main(SimpleCommandLineValidator.java:306)
Caused by: java.lang.ClassCastException: class org.xml.sax.helpers.LocatorImpl cannot be cast to class org.xml.sax.ext.Locator2 (org.xml.sax.helpers.LocatorImpl and org.xml.sax.ext.Locator2 are in module java.xml of loader 'bootstrap')
    at nu.validator.saxtree.Node.<init>(Node.java:92)
    at nu.validator.saxtree.ParentNode.<init>(ParentNode.java:55)
    at nu.validator.saxtree.DocumentFragment.<init>(DocumentFragment.java:40)
    at nu.validator.saxtree.TreeBuilder.<init>(TreeBuilder.java:89)
    at nu.validator.spec.html5.ImageReportAdviceBuilder.endElement(ImageReportAdviceBuilder.java:157)
    at nu.validator.saxtree.TreeParser.endElement(TreeParser.java:133)
    at nu.validator.saxtree.Element.revisit(Element.java:110)
    at nu.validator.saxtree.TreeParser.parse(TreeParser.java:96)
    at nu.validator.htmlparser.sax.HtmlParser.parse(HtmlParser.java:413)
    at nu.validator.spec.html5.ImageReportAdviceBuilder.parseAltAdvice(ImageReportAdviceBuilder.java:67)
    at nu.validator.messages.MessageEmitterAdapter.<clinit>(MessageEmitterAdapter.java:292)
    ... 3 more
faceless2 commented 4 years ago

My apologies - I missed the use of "org.xml.sax.helpers.LocatorImpl" to create an empty Locator for use in a DocumentFragment. I've changed this to use an empty "nu.validator.htmlparser.impl.LocatorImpl" instead, it should fix the problem.

sideshowbarker commented 4 years ago

I've changed this to use an empty "nu.validator.htmlparser.impl.LocatorImpl" instead, it should fix the problem.

OK, yup, it works as excepted for me now

@hsivonen Do you have time to review this patch?

hsivonen commented 4 years ago

r+. Thank you.

@sideshowbarker I let you push the merge button, since you know the practices of this fork better.

Once landed here, I'm then going to upstream the changeset. (I really should figure out the process of moving the upstream repo from https://hg.mozilla.org/projects/htmlparser/ under https://github.com/mozilla/ to make it all happen within git and GitHub.)

sideshowbarker commented 4 years ago

really should figure out the process of moving the upstream repo from https://hg.mozilla.org/projects/htmlparser/ under https://github.com/mozilla/ to make it all happen within git and GitHub.

That would be great :) That’d make it easier to keep the fork up to date, as well. (Currently I do it using hg-git to pull the upstream changes from the mercurial repo and merge them into the git repo for the fork.)

hsivonen commented 4 years ago

@faceless2, are you OK with the way authorship is recorded in the mechanical Gecko tree import of this patch: https://github.com/hsivonen/gecko/commit/46bc6514cee5531b7e7c8da7e704a4178afc6b6b.patch ?

faceless2 commented 4 years ago

Ha, of course - thanks for asking though.