iipc / openwayback

The OpenWayback Development
http://www.netpreserve.org/openwayback
Apache License 2.0
485 stars 275 forks source link

Inconsistent JAR dependencies #350

Open ibnesayeed opened 7 years ago

ibnesayeed commented 7 years ago

When we build the source using mvn package and extract dist/target/openwayback.tar.gz file, two lib directories are created. one is outside the webapp and one inside the webapp under WEB-INF. The one inside WEB-INF has 79 files while the one outside has only 62. Here is the comm view with common files removed while first column shows unique files in outside lib and second column shows unique files in the inner lib directory.

    antlr-2.7.5.jar
    arq-2.2.jar
    arq-extra-2.2.jar
    commons-cli-1.0.jar
commons-cli-1.2.jar
    concurrent-jena-1.3.2.jar
    foresite-0.9.jar
hadoop-ant-0.20.2-cdh3u4.pom
    icu4j-3.4.4.jar
    iri-0.5.jar
    jdom-1.0.jar
    jena-2.5.5.jar
    jenatest-2.5.5.jar
    json-jena-1.0.jar
    log4j-1.2.12.jar
log4j-1.2.17.jar
    lucene-core-2.2.0.jar
    rome-0.9.jar
stax-api-1.0.1.jar
    stax-api-1.0.jar
    wstx-asl-3.0.0.jar
    xalan-2.7.0.jar
    xercesImpl-2.7.1.jar
    xml-apis-1.0.b2.jar
    xmlParserAPIs-2.0.2.jar

This shows that outside lib directory has newer versions of commons-cli, log4j, and stax-api. I think the packaging needs some update for consistency, unless there is a reason why it is the way it is.

runderwood commented 7 years ago

The dependency tree indicates to me that these differences make sense. As an example, you can see the commons-cli-X.Y.jar difference derives from the webapp's use of the Foresite Toolkit:

[INFO] \- com.googlecode.foresite-toolkit:foresite:jar:0.9:compile
[INFO]    +- com.hp.hpl.jena:jena:jar:2.5.5:compile
[INFO]    |  +- com.hp.hpl.jena:arq:jar:2.2:compile
[INFO]    |  |  \- org.apache.lucene:lucene-core:jar:2.2.0:compile
[INFO]    |  +- com.hp.hpl.jena:arq-extra:jar:2.2:compile
[INFO]    |  |  \- com.hp.hpl.jena:jenatest:jar:2.5.5:compile
[INFO]    |  +- com.hp.hpl.jena:iri:jar:0.5:compile
[INFO]    |  +- antlr:antlr:jar:2.7.5:compile
[INFO]    |  +- com.hp.hpl.jena:concurrent-jena:jar:1.3.2:compile
[INFO]    |  +- com.ibm.icu:icu4j:jar:3.4.4:compile
[INFO]    |  +- com.hp.hpl.jena:json-jena:jar:1.0:compile
[INFO]    |  +- log4j:log4j:jar:1.2.12:compile
[INFO]    |  +- stax:stax-api:jar:1.0:compile
[INFO]    |  +- org.codehaus.woodstox:wstx-asl:jar:3.0.0:compile
[INFO]    |  +- xerces:xercesImpl:jar:2.7.1:compile
[INFO]    |  \- xerces:xmlParserAPIs:jar:2.0.2:compile
[INFO]    +- rome:rome:jar:0.9:compile
[INFO]    +- jdom:jdom:jar:1.0:compile
[INFO]    +- xalan:xalan:jar:2.7.0:compile
[INFO]    |  \- xml-apis:xml-apis:jar:1.0.b2:compile
[INFO]    +- commons-cli:commons-cli:jar:1.0:compile
[INFO]    \- joda-time:joda-time:jar:1.6:compile

I may be misunderstanding, but I think this is okay. You can build the dependency trees with mvn:

mvn test-compile dependency:tree

@ibnesayeed @ldko

ibnesayeed commented 7 years ago

@runderwood, I think a more interesting question to ask would be, why are there two separate lib directories with a lot of overlapping JARs. What is their distinct purpose and role? If the two are serving different and independent purposes then it is fine to have such discrepancies. However, if vastly the same JARs are packed into an a lib directory of the build artifact so that people can utilize that without necessarily using the webapp then either the redundancy can be removed and documented or dependencies should be in sync.

runderwood commented 7 years ago

I think the problem, as I understand it, is that the dependencies in question are dependencies of dependencies. So they're not really under our control.

ibnesayeed commented 7 years ago

@runderwood, that makes sense. However, the question still remains, why do we need two separate lib directories? If one is super-set of the other then we can perhaps get rid of other.

runderwood commented 7 years ago

My understanding is that OpenWayback is the parent of the OpenWayback Web App. So they're built at different stages, and they have different dependencies.

There's no way I know of to force a dependency, like the foresite toolkit, to use a different version of a dependency, and I certainly don't know of a way to do so across build targets.

It may be the case that there are extraneous dependencies in the web app. I haven't dug in that far. But I think that's a different question from the one you're raising here. My sense is that a) this is just the way dependency management works in a project w/ multiple targets and b) it's not really a problem.

Of course, I reserve the right to be wrong.