iipc / webarchive-commons

Common web archive utility code.
Apache License 2.0
50 stars 72 forks source link

Pom file dependency for Hadoop ("compile"->"provided") #15

Open willp-bl opened 10 years ago

willp-bl commented 10 years ago

The webarchive-commons pom file specifies a particular Hadoop version as a "compile" dependency, this should probably be "provided" so that jars are not duplicated as they will be on the cluster in any case.

Also - my cluster is CDH4 but the version in central relies on CDH3, not yet sure if this is what is causing me other issues

anjackson commented 10 years ago

Still not had time to look at this yet. Of course, in the meantime, anyone reliant on this artefact can exclude the Hadoop artefact dependency in their pom.xml, and add their own override.

<dependency>
  <groupId>org.netpreserve.commons</groupId>
  <artifactId>webarchive-commons</artifactId>
  <version>1.1.3</version>
  <exclusions>
    <exclusion>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-core</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-core</artifactId>
  <version>0.20.2-cdh3u4</version>
  <scope>provided</scope>
</dependency>
johnerikhalse commented 8 years ago

The Hadoop dependency is needed for reading (w)arcs in HDFS. I can't find any other uses in webarchive-commons. An OpenWayback deployment with warcs stored in HDFS is then dependent on having these libraries included.

The easy solution is to change dependency to provided here and add hadoop-core as a dependency to OpenWayback. Not sure if that requires a major release or if the change is small enough for a minor release.