Merge commits from validator-nu branch to master (then delete validator-nu branch)

sideshowbarker commented 4 years ago

I’d like to merge everything from the validator-nu branch to master, then delete the branch.

And then as far as the HTML checker, the parser integration for it could be built from the master branch here rather from from the validator-nu branch it has been getting built from.

So, looking at https://github.com/validator/htmlparser/compare/validator-nu, there are 19 commits in the validator-nu branch which aren’t in master. None have any merge conflicts with master.

(The validator-nu branch did previously have 20 commits; the additional commit was 2bbdc4d, which removed all the code for the XOM API — but that commit’s no longer in the branch.)

Among those 19 commits, 15 are changes to the error-reporting behavior of the parser.

And among those 15, the largest set are changes necessary to make the error-reporting behavior conform to current requirements in the HTML spec; those 7 commits are the following:

e03e56e Require UTF-8 9ce4bd4 Conform ampersand-error reporting to HTML spec 29a2645 Make consecutive hyphens in comments a non-error f738d97 Drop parse error for missing end tag 1cd256d Emit error (not warning) for HTML4/XHTML1 doctype da6c1ec Report error always for Transitional doctype 1ae6c63 Support “generate all implied end tags thoroughly”

The second-largest set of those error-reporting-behavior commits are 5 bug fixes:

003ad33 Fix typo in error message about over-deep tree ef15099 Fix "non-space characters insided a table" typo de301bc Fix grammar problem in HTML parser error message f4f266c Correct error for EOF in “in template” state 7528857 Report 1024 as byte limit for meta charset sniff

So I think all 12 of those commits listed above could be considered necessary changes.

The remaining 3 changes to the error-reporting behavior, although not strictly necessary, are nonetheless pretty clearly improvements:

143a10b Improve message: bad start tag in noscript in head 5a209fd Remove warning about comments before doctype 5c8fe7a Stop reporting HTML4-specific parse errors

Then, along with the 15 changes above to the error-reporting behavior, there’s 1 change to the core code that, from discussion in #10, it seems we already clearly know we want in master too:

1357528 Ensure every Locator is also a Locator2

And then the remaining 2 changes to the code are minor ones that we don’t necessarily need but that, since the work’s already been done, I guess we should go ahead and merge:

1c66c6f Ensure Java8-runnable code even if Java9-compiled 895f606 Add suppress "unchecked" for a TreeBuilder method

Lastly, there’s one commit that’s not a code change but that just checks support files into the repo:

c5d11f9 Add .classpath, .settings, .project, .mailmap

So while that last change isn’t strictly necessary, it’s still generally useful — or at least it us for anybody using Eclipse. And in fact, I guess the .settings bits of it are particularly useful in that they can help contributors adhere to a consistent coding style in additions/updates to the source.

anthonyvdotbe commented 4 years ago

1c66c6f Ensure Java8-runnable code even if Java9-compiled

Would you mind to explain the scenario this fixes? I'd like to understand why this change is needed.

c5d11f9 Add .classpath, .settings, .project, .mailmap

I agree with .mailmap, but I'd like to avoid the others:

.classpath and .project are generated/managed automatically by Eclipse for Maven projects. So even if you don't use Maven yourself: just the fact that Eclipse detects a pom.xml file (which I'm planning to update as part of #14), will make it manage those files itself
.settings contains preferences, some of which are derived by the IDE from the pom.xml as well. Nearly all others are related to style/formatting. However, w.r.t. formatting, I believe it would be better to add an .editorconfig file, which covers the basics such as indentation and EOL characters, and is supported in pretty much every editor. For detailed formatting/style conventions, I believe PR reviews would suffice.

sideshowbarker commented 4 years ago

1c66c6f Ensure Java8-runnable code even if Java9-compiled

Would you mind to explain the scenario this fixes? I'd like to understand why this change is needed.

The context is https://github.com/validator/validator/issues/634. For a succinct description of the underlying cause, see https://github.com/eclipse/jetty.project/issues/3244#issuecomment-452378420

c5d11f9 Add .classpath, .settings, .project, .mailmap …

.classpath and .project are generated/managed automatically by Eclipse for Maven projects. So even if you don't use Maven yourself: just the fact that Eclipse detects a pom.xml file (which I'm planning to update as part of #14), will make it manage those files itself

Yes, so I agree we don’t need those.

.settings contains preferences, some of which are derived by the IDE from the pom.xml as well. Nearly all others are related to style/formatting.

Yes, those style/formatting ones are the ones we care about

However, w.r.t. formatting, I believe it would be better to add an .editorconfig file, which covers the basics such as indentation and EOL characters, and is supported in pretty much every editor.

It’s been desirable for us so far to enforce more than just the basics. The style/formatting settings in the .settings files came originally from @hsivonen to me years ago because I wanted to ensure the changes I made were consistent with what he was doing.

For detailed formatting/style conventions, I believe PR reviews would suffice.

Sure. But in that case what I would want to ask is that anybody doing reviews should check, as part of the review, that the code they are reviewing conforms to the same style and formatting rules that are formalized in those .settings files. So is there a good way to do that which doesn’t require having those files in the repo? Or is there some other tooling we could have in the repo that provides the same level of granularity for style/format checking as those files?

In the end, if we don’t agree on those files going into master, then I’m OK with that. It’d just be a bit disappointing to me if we lose any way to help automate enforcement of style and formatting.

anthonyvdotbe commented 4 years ago

The context is validator/validator#634. For a succinct description of the underlying cause, see eclipse/jetty.project#3244 (comment)

Thanks, I see now. But isn't it possible to update the build to use --release by now? Apparently it wasn't possible at the time due to Nashorn usage, but it seems Nashorn was eliminated in https://github.com/validator/validator/commit/941c81a2c6f5e6899c70368e558e8eadb3f209ec?

Sure. But in that case what I would want to ask is that anybody doing reviews should check, as part of the review, that the code they are reviewing conforms to the same style and formatting rules that are formalized in those .settings files. So is there a good way to do that which doesn’t require having those files in the repo? Or is there some other tooling we could have in the repo that provides the same level of granularity for style/format checking as those files?

One automated solution would be to use Checkstyle and integrate it in the Maven build. This is IDE-agnostic and would fail the build if the code style/format isn't compliant. But I'm not sure it's actually worth the effort, given that all current contributors seem to use Eclipse.

In the end, if we don’t agree on those files going into master, then I’m OK with that. It’d just be a bit disappointing to me if we lose any way to help automate enforcement of style and formatting.

I don't feel strongly about this, so I'm fine with bringing these files into master. I think all that is really needed, is org.eclipse.jdt.core.prefs though, and then only the org.eclipse.jdt.core.compiler.problem and org.eclipse.jdt.core.formatter properties.

sideshowbarker commented 4 years ago

The context is validator/validator#634. For a succinct description of the underlying cause, see eclipse/jetty.project#3244 (comment)

Thanks, I see now. But isn't it possible to update the build to use --release by now?

OK, thanks — yeah, I can no longer reproduce the failure after dropping 1c66c6f, rebuilding the checker with --release 8 with Java 13, and then testing the resulting jar with Java 8.

So I’ve removed 1c66c6f from the validator-nu branch.

I’ve also removed the c5d11f9 “Add .classpath, .settings, .project, .mailmap” change from the validator branch. I can add the .mailmap change to master in a separate change, and we can use a pull request to discuss if/how to bring the org.eclipse.jdt.core.prefs stuff to master.

So we’re down to 17 commits in the validator-nu branch to merge over.

carlosame commented 4 years ago

Commits e03e56e, 1cd256d, da6c1ec and 5c8fe7a may cause regressions for end users, couldn't they be made/kept configurable (the last commit removes configurability for DOCTYPE)?

For example, 1cd256d enforces behaviour which is intended for "conformance checkers" (see whatwg commit 31c20af) but could be undesirable for other use cases (or at least is not how it used to work).

The XML DocumentBuilder in my DOM implementation (css4j's "native DOM") has a

setStrictErrorChecking(boolean)

that toggles the document builder to a less/more strict behaviour (default is strict), perhaps something similar could be added here to allow non-UTF8, etc.

Also, deprecating public methods before removing them seems like a more sensible policy for a project which has been around for quite a few years and is at version 1.4 already (I mean it is not a 0.x thing).

sideshowbarker commented 4 years ago

Commits e03e56e, 1cd256d, and da6c1ec are all necessary for bringing the parser into conformance with the current HTML spec. The user contract for the parser has always been that it intends to conform to the current requirements in the HTML spec. So it’s not a regression if the user starts getting a new error message from the parser due to the spec requirements having changed and the parser therefore changing in response to the spec change.

We have never provided an option for users to opt-into non-conformance with the HTML spec.

5c8fe7a is the only change that’s not strictly required for conformance with HTML spec requirements. But that’s because it’s not related to requirements in the current HTML spec but instead to requirements in the HTML4 spec. IMHO there’s no longer a clear-enough user need for the behavior to justify keeping the code around. Keeping just contributes to additional code paths and complexity in the code, and it’s not clear there’s any longer enough user real-world benefit to justify the cost of that additional code complexity.

carlosame commented 4 years ago

Commits e03e56e, 1cd256d, and da6c1ec are all necessary for bringing the parser into conformance with the current HTML spec.

You are right, I assumed that err() was producing an exception which is not the case. Sorry for the noise.

We have never provided an option for users to opt-into non-conformance with the HTML spec.

My setStrictErrorChecking is useful to prevent certain DOM-level exceptions from being thrown, but apparently you have all your error detection logic at lower level.

I should not try to look at patches for code that I do not really want to read in full (I manage a project which has DOM stuff, so not a good idea).

Keeping just contributes to additional code paths and complexity

For 5c8fe7a, one possibility would be to produce a version 1.5 with the class and methods (that are to be removed) being deprecated, and then a 2.0 with the code finally removed.

Deprecating first is a good practice but is a bit of additional work, not sure if worth it.

sideshowbarker commented 4 years ago

For 5c8fe7a, one possibility would be to produce a version 1.5 with the class and methods (that are to be removed) being deprecated, and then a 2.0 with the code finally removed.

Deprecating first is a good practice but is a bit of additional work, not sure if worth it.

Well, It might be more worth it if we found there are other things we want to deprecate while we’re at it. I mean, as long as we were going to be making breaking changes and bumping the major version number, it’d make sense to see if there’s anything else that seems like it’s reached its end of life/utility.

hsivonen commented 4 years ago

Sorry about the extremely slow reaction. (Again.)

e03e56e Require UTF-8

r+

9ce4bd4 Conform ampersand-error reporting to HTML spec

I added a couple of notes about the ASCII alphanumeric case in the ambiguous ampersand case.

29a2645 Make consecutive hyphens in comments a non-error

r+ with the note that COMMENT_LESSTHAN appears to transition to itself explicitly. Why?

f738d97 Drop parse error for missing end tag

r+

1cd256d Emit error (not warning) for HTML4/XHTML1 doctype

r+

da6c1ec Report error always for Transitional doctype

r+

1ae6c63 Support “generate all implied end tags thoroughly”

r+

003ad33 Fix typo in error message about over-deep tree

r+ (Needs Gecko-side error message adjustment)

ef15099 Fix "non-space characters insided a table" typo

r+ (Needs Gecko-side error message adjustment)

de301bc Fix grammar problem in HTML parser error message

r+ (Needs Gecko-side error message adjustment)

f4f266c Correct error for EOF in “in template” state

r+

7528857 Report 1024 as byte limit for meta charset sniff

r+

143a10b Improve message: bad start tag in noscript in head

r+ (needs Gecko-side error reporting adjustment)

5a209fd Remove warning about comments before doctype

r+

5c8fe7a Stop reporting HTML4-specific parse errors

r+, but since this affects the public API it would be kinda nice to do a release without this, then declare 2.0, take this, and immediately do a 2.0 release. Considering how bad I've been at releases, not sure if holding anything back is worthwhile, though. Maybe if I really, really resolve to do a release in August... (I'll be even more unresponsive that usual in July.)

Perhaps it would be the clearest to land this but revert it for a release branch, when I actually get around to spinning a release for real.

1357528 Ensure every Locator is also a Locator2

r+

1c66c6f Ensure Java8-runnable code even if Java9-compiled

r+

895f606 Add suppress "unchecked" for a TreeBuilder method

r+

c5d11f9 Add .classpath, .settings, .project, .mailmap

r+

Thanks. I'll prepare a corresponding Gecko patch queue for these.

hsivonen commented 4 years ago

e03e56e Require UTF-8

This one doesn't change any m-c-side code and is ready to be merged right away.

hsivonen commented 4 years ago

003ad33 Fix typo in error message about over-deep tree

This one can be fixed in Gecko independently of this merge, as the change is not in generated code.

hsivonen commented 4 years ago

This one can be fixed in Gecko independently of this merge, as the change is not in generated code.

I take that back, I misread what file changed.

hsivonen commented 4 years ago

@sideshowbarker, could you, please, check that you are happy with how the patch authorship is recorded: https://github.com/hsivonen/gecko/commits/validator-nu ? There are links to the Phabricator reviews from each of the changesets. (In case anyone wants pull these, the hashes expect a git-cinnabar mozilla-unified checkout as the base as opposed to a gecko-dev base).

sideshowbarker commented 4 years ago

@sideshowbarker, could you, please, check that you are happy with how the patch authorship is recorded: https://github.com/hsivonen/gecko/commits/validator-nu ?

Yup, all fine by me

sideshowbarker commented 4 years ago

@hsivonen Can you clarify what the next steps are?

I guess we have two separate places where the changes need to be merged? One place is to the repo where they need to do for Gecko use, and the second place other is the master branch of this repo.

At what point shall we merge the changes to the master branch of this repo?

The steps I anticipate are:

Merge all changes except 5c8fe7a from the validator-nu branch to the master branch
Tag the master branch with a 1.5 tag.
Do a 1.5 Maven release to the Central Repo
Merge 5c8fe7a to the master branch.
Tag the master branch a 2.0 tag.
Do a 2.0 Maven release to the Central Repo

I can handle the deployment of both releases to the Central Repo; specifically, to the https://repo1.maven.org/maven2/nu/validator/htmlparser/ area where the jars from the validator-nu branch have been published (rather than the old https://repo1.maven.org/maven2/nu/validator/htmlparser/htmlparser/ subtree).

I assume you’d also be publishing zip packages of both the 1.5 and 2.0 releases https://about.validator.nu/htmlparser/ but I assume that can be done independently from the Maven packages.

carlosame commented 4 years ago

The steps I anticipate are:

I suggest adding three steps between 1 and 2:

1.b) Marking the methods to be removed in 5c8fe7a as @Deprecated 1.c) Modify the Maven POM to depend on XOM version 1.3.5 or later. 1.d) Pull my request for an automatic module name.

@anthonyvdotbe may want to produce a pull request for 1.c, as that XOM 1.3.5 release (which declares a JPMS module name) was produced after he asked for it.

anthonyvdotbe commented 4 years ago

@sideshowbarker I'd reserve the 2.0 tag for the first modular release, i.e. https://github.com/validator/htmlparser/issues/14 @hsivonen for the record, I'm waiting for your response to https://github.com/validator/htmlparser/issues/14#issuecomment-619345316. If your answer to those 3 questions is "yes", I'll provide a PR for it, which could then be merged between steps 4 & 5 above

sideshowbarker commented 4 years ago

1.b) Marking the methods to be removed in 5c8fe7a as @Deprecated

Done in #22. Please review there and let me know if I missed anything or got anything wrong.

1.c) Modify the Maven POM to depend on XOM version 1.3.5 or later.

Done in #22 (for the 1.5 release) and also in the validator.nu branch (for the 2.0 release and after).

1.d) Pull my request for an automatic module name.

Done in #22 (for the 1.5 release) and also in the validator.nu branch (for the 2.0 release and after).

sideshowbarker commented 4 years ago

@sideshowbarker I'd reserve the 2.0 tag for the first modular release, i.e. #14

Good point, and agreed — as long as we all have agreement about what the modularization should look like.

And incidentally, based on discussion with @hsivonen offline, I can say won’t doing a new release before August.

Part of the reason is that the code I wrote for 9ce4bd4 (Conform ampersand-error reporting to HTML spec) has an unintended and unacceptable side effect — regressing some existing parser tests (details) — and I anticipate it’s going to take a while to get that code rewritten and reviewed.

@hsivonen for the record, I'm waiting for your response to #14 (comment). If your answer to those 3 questions is "yes", I'll provide a PR for it, which could then be merged between steps 4 & 5 above

I’m inclined to believe that @hsivonen doesn’t have any strong opinion about any of those questions — as long as the changes that get made would not require changes to how the Java code gets transformed into C++ code and integrated into Gecko.

I don’t object to any of the changes myself, even if they require me to make changes to the validator build. But neither @hsivonen nor I have much of any experience with Java modularization nor are we ourselves users of the Maven packages.

So among the others who have voiced opinions, I think the most weight should be given to what you and @carlosame think is the right thing to do. So if you and @carlosame were to be able to reach agreement on it, I think it’s most likely that @hsivonen will end up just deferring to your judgement on it (assuming — as I mentioned earlier — that no changes that get made will end up require changes to how the Java code gets transformed into C++ code and integrated into Gecko).

But if you want a response from @hsivonen about it, you’re likely going to need to wait until August.

carlosame commented 4 years ago

Done in #22. Please review there and let me know if I missed anything or got anything wrong.

Awesome, thanks for the fast reaction, you even annotated protected methods.

There was little point in releasing 1.5 without those annotations, and while I'd personally be fine skipping the deprecation, doing that is an industry standard practice that people tend to follow religiously out there.

1.c) Modify the Maven POM to depend on XOM version 1.3.5 or later.

Done in #22 (for the 1.5 release) and also in the validator.nu branch (for the 2.0 release and after).

See the comment that I left there.

1.d) Pull my request for an automatic module name.

Done in #22 (for the 1.5 release) and also in the validator.nu branch (for the 2.0 release and after).

That's good news. A full modularization is unlikely to be ready soon, and the lack of a declared module name is affecting any downstream project that wants to modularize (and you have 150 downstream projects only in Github).

carlosame commented 4 years ago

Regarding the questions asked by @anthonyvdotbe, the first one amounts to "should we have one or two jar files?", and it is already responded in the sense that the project has only one Jar file (and no reason to split it now).

The second one (the RPM support): could be dropped if nobody is supporting that code, but I'd like to look into it in more detail later (so no final opinion for now).

The third question: while my PR for the module name is compatible with OSGi, I confirm that a full modularization would not be. So my answer is Yes, the OSGi stuff should be removed for 2.0.

won’t doing a new release before August.

And if August means using your vacation time, I'd advise to delay until September. Developer burnout is a serious issue (just look at how many unresponsive opensource projects are in Github), and disconnecting from all this stuff during vacations is important. (BTW I personally have no issue about doing software stuff in August: I haven't been in the IT industry for more than a decade...)

And finally, let's not forget about issue #16 (I'd remove that directory).

sideshowbarker commented 4 years ago

let's not forget about issue #16 (I'd remove that directory).

OK, yes — agreed, and removed in e5e552f

The second one (the RPM support): could be dropped if nobody is supporting that code,

I have no reason to believe we’re aware so far of any actual need for it, so I’ve gone ahead and just dropped it for now — both in the branch for the 1.5 release and in the validator.nu branch.

but I'd like to look into it in more detail later (so no final opinion for now).

OK — if you find any new information about an actual need for it, we could revert the removal.

The third question: while my PR for the module name is compatible with OSGi, I confirm that a full modularization would not be. So my answer is Yes, the OSGi stuff should be removed for 2.0.

OK, I’ve gone ahead and removed it from the validator-nu and also from the branch for the 1.5 release — again, because I have no reason to believe it was added to begin with due to any actual specific need for it.

won’t doing a new release before August.

And if August means using your vacation time, I'd advise to delay until September. Developer burnout is a serious issue (just look at how many unresponsive opensource projects are in Github), and disconnecting from all this stuff during vacations is important.

That all makes sense.

But note that I should have further qualified what I meant; I should have said we won’t be dong a full release before August, by which I mean publishing any non-Maven distribution at https://about.validator.nu/htmlparser/ in the way that the 1.4 release and previous releases were published.

But that doesn’t prevent first publishing a 1.5 Maven package in the same way the 1.4.1 to 1.4.16 packages have already been published.

So if it seems useful to first publish a 1.5 Maven package soon, then I could do that.

carlosame commented 4 years ago

I’ve gone ahead and removed it from the validator-nu and also from the branch for the 1.5 release

And by doing that, you removed my patch for the automatic module name, which was working on top of OSGi. I'd keep the OSGi support for 1.5 (although we all suspect that nobody is using the OSGi metadata, dropping it is effectively a backwards-incompatible change).

sideshowbarker commented 4 years ago

I’ve gone ahead and removed it from the validator-nu and also from the branch for the 1.5 release

And by doing that, you removed my patch for the automatic module name, which was working on top of OSGi.

oops

I'd keep the OSGi support for 1.5 (although we all suspect that nobody is using the OSGi metadata, dropping it is effectively a backwards-incompatible change).

OK — I’ve restored it now on the 1.5 branch

anthonyvdotbe commented 4 years ago

@sideshowbarker I'd like to ask not to include an automatic module name in the 1.5 release: adding an automatic module name implies a single module, with the nu.validator.saxtree package being part of it. However, my proposal is to modularize into 2 modules, nu.validator.htmlparser and nu.validator.saxtree, where requiring the former doesn't transitively include the latter.

On a related note, please don't upgrade the XOM version: for the sake of build reproducibility and stability, fixed versions should be used. And there's no change in the htmlparser code that requires upgrading the XOM version, so I believe we should just hold off till the 2.0 release, when the modularization will require the upgrade.

Regarding the questions asked by @anthonyvdotbe, the first one amounts to "should we have one or two jar files?", and it is already responded in the sense that the project has only one Jar file (and no reason to split it now).

This is the single remaining point to reach agreement on: should we split the htmlparser and saxtree packages into separate Maven modules? I strongly believe the answer is "yes", but @carlosame believes otherwise. So I do believe it's important for you & @hsivonen to weigh in & help reach agreement.

To reiterate my argument from the discussion in https://github.com/validator/htmlparser/issues/14:

saxtree is useful on its own, and should thus be in a Java module of its own
not having a 1-on-1 mapping between Maven modules and Java modules unnecessarily complicates the Maven build -> saxtree should be split off into its own Maven module

Moreover, by convention, module names should be a prefix of any packages it contains (to avoid running into the "split package" problem). Since the packages are nu.validator.htmlparser and nu.validator.saxtree, to comply with this convention:

either there should be a single module named nu.validator, but that doesn't make sense as a module name
there should be 2 modules, nu.validator.htmlparser and nu.validator.saxtree

For what it's worth: I've already done a modularization that splits into 2 modules, so if we do reach agreement that 2 modules is the way to go, providing a PR is a matter of rebasing against master to integrate the recent changes

carlosame commented 4 years ago

there's no change in the htmlparser code that requires upgrading the XOM version

For people on modular JDKs, versions of XOM prior to 1.3.5 are unusable, so requiring at least 1.3.5 (which 2.0 is going to require anyway) is reasonable, given that 1.5 is intended to have a module name.

should we split the htmlparser and saxtree packages into separate Maven modules?

We already discussed this a few times, but well...

saxtree is useful on its own, and should thus be in a Java module of its own

The idea that anything "useful on its own" should have its own module is inaccurate. Also, I looked into saxtree and most of it are final classes that implement DOM node interfaces.

Only someone willing to build a clone DOM implementation (why??) would be interested in that code, with TreeParser and TreeBuilder being the only interesting classes there.

not having a 1-on-1 mapping between Maven modules and Java modules unnecessarily complicates the Maven build

None of the proposals being talked about require a Maven modular build with different Java and Maven modules.

by convention, module names should be a prefix of any packages it contains (to avoid running into the "split package" problem)

Sometimes the name of a module will be the same as an umbrella package name, but sometimes not. And split packages aren't caused by module naming.

there should be 2 modules, nu.validator.htmlparser and nu.validator.saxtree

Two modules mean two jar files, with all users being forced to add configuration for a second artifact, given that nu.validator.htmlparser would depend on nu.validator.saxtree. And that, for a package with barely interesting contents.

anthonyvdotbe commented 4 years ago

For people on modular JDKs, versions of XOM prior to 1.3.5 are unusable, so requiring at least 1.3.5 (which 2.0 is going to require anyway) is reasonable, given that 1.5 is intended to have a module name.

My comment started with "I'd like to ask not to include an automatic module name in the 1.5 release" and my request to revert the XOM version was building on that.

The idea that anything "useful on its own" should have its own module is inaccurate.

Fair enough: replace "saxtree is useful on its own" with "I believe saxtree fits the definition of a module because, from my current understanding, it's a self-contained, appropriately sized, unit of functionality. Moreover, in the words of @hsivonen, "saxtree is useful on its own".

Only someone willing to build a clone DOM implementation (why??) would be interested in that code, with TreeParser and TreeBuilder being the only interesting classes there.

When I initially proposed to move the saxtree package under the htmlparser package (in the OP of #14), @hsivonen replied: "saxtree is useful on its own and it's possible to run htmlparser in a mode that doesn't use it." Sure, maybe he didn't mean that to imply it should be a module of its own, which is why I asked for confirmation on whether it's ok to split into 2 modules.

None of the proposals being talked about require a Maven modular build with different Java and Maven modules.

Obviously, my proposal does require so.

And split packages aren't caused by module naming.

I never said split packages are caused by module naming.

Two modules mean two jar files, with all users being forced to add configuration for a second artifact, given that nu.validator.htmlparser would depend on nu.validator.saxtree. And that, for a package with barely interesting contents.

Since the POM of nu.validator.htmlparser will have a dependency on nu.validator.saxtree, the latter is a transitive dependency, and there's no need to specify a second artifact. Either way: I believe proper modularization is way more important than users having to make a change to their POM.

@carlosame if it's decided in favor of 1 module, will you provide a PR to modularize the project? Because if not, I don't see who else will do so in the foreseeable future?

carlosame commented 4 years ago

I never said split packages are caused by module naming.

You related split packages to module naming and they have nothing to do, so that argument against a single module is not valid, do we agree?

will have a dependency on nu.validator.saxtree, the latter is a transitive dependency, and there's no need to specify a second artifact.

From your previous message:

nu.validator.htmlparser and nu.validator.saxtree, where requiring the former doesn't transitively include the latter

Moving targets are hard to follow, and this modularization has been one since the beginning.

will you provide a PR to modularize the project?

It is included in PR #23, and modularization is a relatively small part of the cleanups that the POM needed. Hopefully we can move on now.

anthonyvdotbe commented 4 years ago

You related split packages to module naming and they have nothing to do, so that argument against a single module is not valid, do we agree?

We don't, and I'm tired of clarifying myself every time my words are twisted and/or misinterpreted.

From your previous message:

nu.validator.htmlparser and nu.validator.saxtree, where requiring the former doesn't transitively include the latter

Moving targets are hard to follow, and this modularization has been one since the beginning.

One last time then: the htmlparser Maven module transitively requires the saxtree Maven module. However, the htmlparser Java module does not transitively require the saxtree Java module. So no moving targets, just another example of my words being misinterpreted & selectively quoted.

It is included in PR #23, and modularization is a relatively small part of the cleanups that the POM needed. Hopefully we can move on now.

As far as I'm concerned, we can't: we fundamentally disagree about whether there ought to be 1 or 2 modules. So I propose we end the discussion for now and wait for @hsivonen to take a decision.

hsivonen commented 4 years ago

I acknowledge that there are pending Maven question for me to respond to. Meanwhile, to make the gap between the branches smaller, I made a PR for merging the patches from above that are on Gecko autoland right now.

sideshowbarker commented 3 years ago

I’d like to merge everything from the validator-nu branch to master, then delete the branch.

Done 🎉

validator / htmlparser

Merge commits from validator-nu branch to master (then delete validator-nu branch) #17