validator / htmlparser

The Validator.nu HTML parser https://about.validator.nu/htmlparser/
Other
56 stars 26 forks source link

Add test automation #44

Closed sideshowbarker closed 3 years ago

sideshowbarker commented 4 years ago

The changes in this pull-request branch enable us, for all pull requests and pushes, to automate testing of the parser against the html5lib-tests suite — including CI execution (using GitHub Actions) of the tests .

The changes include the following commits:

…along with some related minor changes to get things working smoothly.

carlosame commented 4 years ago

I pulled your branch to investigate the error, but I ran into this:

[INFO]      [java] Exception in thread "main" java.io.FileNotFoundException: C:\Users\****\oss\htmlparser\html5lib-tests\encoding (Acceso denegado)
[INFO]      [java]      at java.base/java.io.FileInputStream.open0(Native Method)
[INFO]      [java]      at java.base/java.io.FileInputStream.open(FileInputStream.java:212)
[INFO]      [java]      at java.base/java.io.FileInputStream.<init>(FileInputStream.java:154)
[INFO]      [java]      at java.base/java.io.FileInputStream.<init>(FileInputStream.java:109)
[INFO]      [java]      at nu.validator.htmlparser.test.EncodingTester.main(EncodingTester.java:116)

which is caused by ant giving a directory name to EncodingTester while that class expects a filename. Obviously the FileInputStream is not happy.

Is that branch fully updated?

anthonyvdotbe commented 4 years ago

Indeed, EncodingTester hasn't been adapted yet to take a directory, though the others have ( see 62baf8a14eaabe5a8facfe862786a7aa2d69d780 )

carlosame commented 4 years ago

For the record, I was able to execute the tests without the fork="true" and all I had to do is to update the version of the maven-antrun-plugin from 1.7 to 3.0.0. Updating the maven-bundle-plugin from 2.3.7 to 5.1.1 doesn't hurt either.

sideshowbarker commented 4 years ago

For the record, I was able to execute the tests without the fork="true" and all I had to do is to update the version of the maven-antrun-plugin from 1.7 to 3.0.0

Yes — I changed the version locally, and removed the fork="true" attributes, and confirmed it works as expected.

Updating the maven-bundle-plugin from 2.3.7 to 5.1.1 doesn't hurt either.

OK, went ahead and made that change too.

Pushed the updates to this branch. Thanks again very much.

sideshowbarker commented 4 years ago

Well I just now see that dropping fork="true" causes the Java 8 build to fail.

https://github.com/validator/htmlparser/pull/44/checks?check_run_id=1015950576

Maybe I should just drop Java 8 from the test matrix? Or do we want to be testing Java 8?

carlosame commented 4 years ago

Maybe I should just drop Java 8 from the test matrix? Or do we want to be testing Java 8?

If you test with Java 8, ant is going to look for tools.jar which you removed from classpath in d0dd5abc.

Testing with version 11 and then the latest JDK (14 currently) makes sense, Java 8 is possibly unnecessary.

carlosame commented 4 years ago

Incidentally, in css4j I'm compiling my tests for target 8 due to a JPMS bug in the maven-compiler-plugin.

That bug is caused by using test classes in one module from the test classes in another module (yes the test-jar is listed as a dependency), and in principle this project has nothing close to that so it should not be affected.

sideshowbarker commented 4 years ago

So as y’all may have noticed, I changed the code in this PR branch to remove the kludgey AntRun additions I’d made, and instead use @anthonyvdotbe’s Html5libTest class.

That approach currently requires adding an extra config setting, to teach Surefire where the html5lib-tests directory is.

That config setting would be unnecessary if/when we end up moving to a new directory layout that follows Maven conventions. But after we do ever get there, the POM can be updated to drop the extra config setting.

@anthonyvdotbe Can you confirm you’re agreeing to license the Html5libTest code under the existing project license in the https://github.com/validator/htmlparser/blob/master/LICENSE.txt file? (It’s essentially the MIT license.)

If so, do you want to have a Copyright (c) 2020 Anthony Vanelverdinghe line included in the license/copyright header comment in the source — along with a Copyright (c) 2020 Mozilla Foundation line?

Or are you instead OK with the license header having only the Copyright (c) 2020 Mozilla Foundation line? (In other words, in that case, could you confirm that you’re waiving your right to have your name included in the copyright statement.)

anthonyvdotbe commented 4 years ago

@anthonyvdotbe Can you confirm you’re agreeing to license the Html5libTest code under the existing project license in the https://github.com/validator/htmlparser/blob/master/LICENSE.txt file? (It’s essentially the MIT license.)

Yes, I confirm that I agree.

Or are you instead OK with the license header having only the Copyright (c) 2020 Mozilla Foundation line? (In other words, in that case, could you confirm that you’re waiving your right to have your name included in the copyright statement.)

I confirm that I'm waiving my right to have my name included in the copyright statement.

carlosame commented 4 years ago

I changed the code in this PR branch to remove the kludgey AntRun additions I’d made, and instead use @anthonyvdotbe’s Html5libTest class.

The conversion of all the tests (including Html5libTest) to JUnit is something that could be done after this PR is merged, and mostly looks like a search-and-replace trick.

sideshowbarker commented 4 years ago

The conversion of all the tests (including Html5libTest) to JUnit is something that could be done after this PR is merged, and mostly looks like a search-and-replace trick.

OK. But I personally know almost nothing about JUnit, so anything that’d need to be done to move to JUnit would have to come in form of PRs with patches — all of which I’d be happy to test, and to review (and learn from…).

But if we were to move to JUnit, I would hope there’d be some concrete measurable benefit to doing so. I mean, the current POJO way that Html5libTest uses seems to me to at least have the advantage of getting the job done while also being very simple. I would imagine that moving to JUnit would make it less simple. So I’d wonder what concretely we’d be gaining by losing the simplicity and taking on the cost of some greater complexity.

Or maybe if I had more domain knowledge in this area, it’d be clear to me why moving to JUnit makes the best sense. Maybe there’s some intrinsic/potential value in it that’s generally well understood already by most people.

carlosame commented 4 years ago

Maybe there’s some intrinsic/potential value in it that’s generally well understood already by most people.

For example, not having to tell the Surefire configuration which files are tests and which aren't.

I personally know almost nothing about JUnit

It is easy and fast to learn the basics.

so anything that’d need to be done to move to JUnit would have to come in form of PRs with patches

I do not have plans to send more PRs.

carlosame commented 4 years ago

I see that you re-enabled testing for Java 8, which is a good thing (commits 05b9024 and a27c7b6), but using Java 8 is unlikely to get along very well with the current modularization patches (as long as there is the requirement that the minimum JDK to process the POM is 11). And running automated testing with Java 11 or higher is going to be a quite different game once the project is modularized.

It's not that you cannot do either of those things (you can even use Java 8 if the POM is complicated enough) but modularization is highly coupled with testing, and setting up an automated testing infrastructure without accounting for modules has the potential of being a waste of time.

IMHO either this PR or the modularization PR should be merged as soon as possible (BTW after making any necessary changes), and then the other PR can rebase on top of it. Yes I assume that you can't do a lot either way, just making sure that you are aware of this.

sideshowbarker commented 4 years ago

@carlosame Thanks for the heads-up about the issue dealing with Java 8 in the face of the modularization patches.

It’d be pretty disappointing if Maven, in all its baroque complexity, didn’t provide a way to build a modularized jar for current Java versions while also remaining able to ensure the code at least continues to compile under Java 8.

I personally am not willing to cavalierly ignore all the users who possibly might be constrained to needing to use the parser in a Java 8 environment. I don’t have a way to measure how many such users the parser might have. For all I know, it could be there are so few users of the parser that need Java 8 compatibility that we don’t need to worry about it. Or it could be that there are some important users for whom we’d be abandoning when we rightly shouldn’t be.

But if we can’t even ourselves continue to have an automated way to test that the code continues compiles under Java 8, then we have no way to know when some code change we made has broken compatibility with Java 8.

sideshowbarker commented 4 years ago

On further reflection, it occurs to me to ask: As far as building with Maven goes, couldn’t we just have one pom.xml for building/testing the “normal” build for Java 11+, and then a second pom.xml for building/testing the Java 8 build?

carlosame commented 4 years ago

ensure the code at least continues to compile under Java 8

Hmm that's not what I was meaning. The modularization patches in PR #43 generate bytecode for Java 8 (IMHO should be 7) except for the module-info file. One thing is "compiling for Java 8", another is "testing with Java 8". In fact, the most likely outcome for this project's modularization is that the actual bytecode being tested will be version 8 but the JDK used during the tests is 11, 15 or whatever.

Actually, your current patches in this PR are always testing Java 8 bytecode, you are just producing and testing that level-8 bytecode with different JDKs. And source-level compatibility is set to 8 as well.

carlosame commented 4 years ago

Additional clarification to my previous post: the idea is that Java 8 users will be able to use the compiled jar files, but won't be able to build them. A modular JDK is required to build.

If you still want to test with Java 8 and want to avoid undesirable POM hacks, you could set up a separate test project that would depend on a level-8 test-jar produced by the main build. But I see no real reason to do that.

sideshowbarker commented 3 years ago

I believe everything in this PR branch is now also in the main branch — so I’m going ahead and closing this.