openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

Request for provision of a thin ext-modules jar #861

Open jackdos opened 1 year ago

jackdos commented 1 year ago

The jhove-ext-modules jar is currently built using maven-assembly, which means the jar has all required dependencies (including third party dependencies) built in.

This is currently causing us problems when trying to update from 1.22.1. The first problem is that versions beyond 1.22.1 including third party XML handling libraries, which clash with dependencies we already have in our application. The normal route to solving this would be to explicitly exclude the dependencies we don't want to take, but because they are built into the jar, we can't do that.

The second problem is that, because we have tried to update the core library, and leave the ext-modules library, we have a classpath conflict for jhove's own XmlHandler. We are getting one version with the jhove-core jar, and an older version built into the jhove-ext-module jar, and java's behaviour in this area means that we can't accurately predict which class will load first and be used by the application.

Neither the jhove-core jar, nor the jhove modules seem to use this maven assembly pattern, so it's not clear whether this is deliberate for the ext-modules, or whether there is some other requirement for the ext-modules that means this is done, but at least having the option to not take the full assembly jar via the maven dependencies would be useful.

asciim0 commented 1 year ago

Is that similar to this - https://github.com/openpreserve/jhove/issues/641 ?

jackdos commented 1 year ago

Similar, but not the same, thanks @asciim0

I don't mind getting the PNG, epub, and gzip modules all bundled into one jar (even though we're only using PNG at the moment), what I "object" to, and what this ticket is about, is then getting a second copy of the compiled classes from jhove core, and the third party epub, xml and gzip libraries.

It probably makes sense to try to look at both tickets together, but you could technically do either one without the other.