lintool / warcbase

Warcbase is an open-source platform for managing analyzing web archives
http://warcbase.org/
161 stars 47 forks source link

fail on declaring a dependency on warcbase-core in a SBT project #245

Open dportabella opened 8 years ago

dportabella commented 8 years ago

As warcbase-core artifact is not yet published in a repository (snapshots nor releases), I do as follows:

git clone http://github.com/lintool/warcbase.git
cd warcbase
mvn install

then, for my SBT project, I add this to build.sbt:

resolvers += Resolver.mavenLocal

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.warcbase" % "warcbase-core" % "0.1.0-SNAPSHOT"
)

When I run sbt -Dspark.master=local[2] run, I get this exception: Exception in thread "main" java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal

My program (for now just a SparkPi example that does not use warcbase) runs ok if I remove the warcbase-core dependency from build.sbt.

Using sbt dependencyTree, I see the following:

[info]   +-org.warcbase:warcbase-core:0.1.0-SNAPSHOT [S]
[info]     +-com.chuusai:shapeless_2.10.5:2.0.0 [S]
[info]     +-com.google.guava:guava:14.0.1
[info]     +-com.syncthemall:boilerpipe:1.2.2
[info]     | +-net.sourceforge.nekohtml:nekohtml:1.9.20
[info]     | | +-xerces:xercesImpl:2.10.0 (evicted by: 2.11.0)
[info]     | | +-xerces:xercesImpl:2.11.0
[info]     | |   +-xml-apis:xml-apis:1.4.01
[info]     | |
[info]     | +-xerces:xercesImpl:2.11.0
[info]     |   +-xml-apis:xml-apis:1.4.01
[info]     |
[info]     +-edu.stanford.nlp:stanford-corenlp:3.4.1
[info]     | +-com.googlecode.efficient-java-matrix-library:ejml:0.23
[info]     | +-com.io7m.xom:xom:1.2.10
[info]     | | +-xalan:xalan:2.7.0
[info]     | | | +-xml-apis:xml-apis:1.3.03 (evicted by: 1.4.01)
[info]     | | | +-xml-apis:xml-apis:1.4.01
[info]     | | | +-xml-apis:xml-apis:2.0.2 (evicted by: 1.4.01)
...

I don't understand really the problem, but on this post they propose to add this xml-apis:1.4.01 dependency. And indeed it works with this build.sbt:

resolvers += Resolver.mavenLocal

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.warcbase" % "warcbase-core" % "0.1.0-SNAPSHOT",
  "xml-apis" % "xml-apis" % "1.4.01"
)

The question is why I need to add that dependency? what is the problem?

And while this works from the command line, it still fails when running it from IntelliJ with the same previous error.

Also, it would be great to publish the warcbase-core artifact (snapshots and releases) in a repository. Do you plan to do that? Can I help on this?

dportabella commented 8 years ago

I also tried another approach: to include the warcbase as an unmanaged dependency. I put warcbase-core-0.1.0-SNAPSHOT-fatjar.jar in myproject/lib/ and I keep a simple build.sbt:

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
)

I get this error when I run it from the terminal:

$ sbt -Dspark.master=local[2] run

ERROR SparkContext - Error initializing SparkContext.
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.event-handlers'

what can be the problem?