hunterhector / dbpedia-spotlight

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text.
http://spotlight.dbpedia.org/
6 stars 0 forks source link

NoSuchMethodError when running ExtractOccsFromWikipedia #1

Closed hunterhector closed 12 years ago

hunterhector commented 12 years ago

This error only occurs when executing the ExtractOccsFromWikipedia process on English Wkikpedia, the other process in the index.sh works just fine.

However, the process did give out some output, about 3686 lines, the following is the first line:

Anarchism-p2l1  Family_resemblance  family resemblance   Anarchism is a political philosophy which considers the state undesirable, unnecessary and harmful, and instead promotes a stateless society, or anarchy. It seeks to diminish or even abolish authority in the conduct of human relations. Anarchists may widely disagree on what additional criteria are required in anarchism. Oxford Companion to Philosophy says, "there is no single defining position that all anarchists hold, and those considered anarchists at best share a certain family resemblance."    481 

echo $INDEX_CONFIG_FILE produce ../conf/indexing.properties, of course....

The output of mvn -version is:

Apache Maven 2.2.1 (r801777; 2009-08-07 03:16:01+0800)
Java version: 1.7.0
Java home: /opt/java/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux" version: "3.3.5-1-arch" arch: "amd64" Family: "unix"

The following is the full stack trace of the error.

[hector@hector-arch index]$ mvn scala:run -DmainClass=org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia "-DaddArgs=$INDEX_CONFIG_FILE|output/occs.tsv"
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building DBpedia Spotlight Indexing
[INFO]    task-segment: [scala:run]
[INFO] ------------------------------------------------------------------------
[INFO] Preparing scala:run
[INFO] [install:install-file {execution: install-weka-jar}]
[INFO] Installing /home/hector/Researches/nlp/DBpedia_Spotlight/dbpedia-spotlight/index/../lib/weka-trunk.jar to /home/hector/.m2/repository/weka/weka/3.7.3/weka-3.7.3.jar
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/hector/Researches/nlp/DBpedia_Spotlight/dbpedia-spotlight/index/src/main/resources
[INFO] [scala:add-source {execution: scala-compile-first}]
[INFO] Add Source directory: /home/hector/Researches/nlp/DBpedia_Spotlight/dbpedia-spotlight/index/src/main/scala
[INFO] Add Test Source directory: /home/hector/Researches/nlp/DBpedia_Spotlight/dbpedia-spotlight/index/src/test/scala
[INFO] [scala:compile {execution: scala-compile-first}]
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Nothing to compile - all classes are up to date
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/hector/Researches/nlp/DBpedia_Spotlight/dbpedia-spotlight/index/src/test/resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] No sources to compile
[INFO] [scala:run {execution: default-cli}]
[INFO] Checking for multiple versions of scala
 INFO 2012-05-18 14:10:10,642 main [IndexingConfiguration] - Loading configuration file ../conf/indexing.properties
 INFO 2012-05-18 14:10:10,748 main [ExtractOccsFromWikipedia$] - Loading concept URIs from output/conceptURIs.list...
 INFO 2012-05-18 14:10:17,567 main [ExtractOccsFromWikipedia$] - Loading redirects transitive closure from output/redirects_tc.tsv...
 INFO 2012-05-18 14:10:27,077 main [FileOccurrenceSource$] - Writing occurrences to file output/occs.tsv ...
java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
    at org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
Caused by: java.lang.NoSuchMethodError: scala.collection.immutable.StringOps.slice(II)Ljava/lang/String;
    at org.dbpedia.spotlight.io.DisambiguationContextSource$.getOccurrence(DisambiguationContextSource.scala:153)
    at org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource$$anonfun$foreach$1$$anonfun$apply$1.apply(AllOccurrenceSource.scala:101)
    at org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource$$anonfun$foreach$1$$anonfun$apply$1.apply(AllOccurrenceSource.scala:97)
    at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
    at scala.collection.immutable.List.foreach(List.scala:45)
    at org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource$$anonfun$foreach$1.apply(AllOccurrenceSource.scala:97)
    at org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource$$anonfun$foreach$1.apply(AllOccurrenceSource.scala:78)
    at org.dbpedia.extraction.sources.WikipediaDumpParser.readPage(WikipediaDumpParser.java:218)
    at org.dbpedia.extraction.sources.WikipediaDumpParser.readPages(WikipediaDumpParser.java:159)
    at org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpParser.java:107)
    at org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.java:87)
    at org.dbpedia.extraction.sources.XMLSource$XMLFileSource.foreach(XMLSource.scala:40)
    at org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource.foreach(AllOccurrenceSource.scala:78)
    at org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:53)
    at org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:53)
    at org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:53)
    at org.dbpedia.spotlight.io.FileOccurrenceSource$.writeToFile(FileOccurrenceSource.scala:59)
    at org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia$.main(ExtractOccsFromWikipedia.scala:79)
    at org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia.main(ExtractOccsFromWikipedia.scala)
    ... 6 more
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 240(Exit value: 240)

[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 24 seconds
[INFO] Finished at: Fri May 18 14:10:33 HKT 2012
[INFO] Final Memory: 34M/344M
[INFO] ------------------------------------------------------------------------
pablomendes commented 12 years ago

Hi Hector, thanks! Please also show us the content of $INDEX_CONFIG_FILE

echo  $INDEX_CONFIG_FILE

Please note that for better readability you can also markup code (and stacktrace) with three "backwards accent" (See: Fenced code blocks http://github.github.com/github-flavored-markdown/)

What happens if you change line 153 to use the "substring" method instead of "slice"? You can also do a "try/catch" and print out the line you were extracting when the error happened.

Can you please test these things and update this issue?

I tested a few things to see if the problem was with an empty string, but it doesn' t seem so:

scala> val test = "This is a test"
test: java.lang.String = This is a test

scala> test.slice(1,3)
res0: scala.runtime.RichString = hi

scala> test.substring(1,3)
res1: java.lang.String = hi

scala> test.slice(1,0)    
res2: scala.runtime.RichString = 

scala> test.slice(0,0)
res3: scala.runtime.RichString = 

scala> test.slice(0,-1)
res4: scala.runtime.RichString = 

scala> test.slice(0,100)
res8: scala.runtime.RichString = This is a test
maxjakob commented 12 years ago

The method is known at compile time. At run time, you get the NoSuchMethodError. I suspect there is a version clash or something related.

pablomendes commented 12 years ago

It shows "[INFO] Checking for multiple versions of scala" but doesn't give a WARNING.

Please also show the result for "mvn -version" You should also try "mvn clean install"

Cheers, Pablo

maxjakob commented 12 years ago

Hector, I pulled your repo and the command works on my machine without the error. I still suspect there is problem with versions. Do you multiple version of Scala?

Please try the following:

  1. delete your local Maven repo (~/.m2/repository/*)
  2. run mvn clean install from the root of DBpedia Spotlight
  3. run the command again
hunterhector commented 12 years ago

Well, after delete my local repo and rerun Maven install, I ran into errors. These errors occurs before and they once disappeared (so that I finally built the project, but they were here again)

[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.

Missing:
----------
1) org.apache.maven.shared:maven-shared-io:jar:1.1

  Try downloading the file manually from the project website.

  Then, install it using the command: 
      mvn install:install-file -DgroupId=org.apache.maven.shared -DartifactId=maven-shared-io -Dversion=1.1 -Dpackaging=jar -Dfile=/path/to/file

  Alternatively, if you host your own repository you can deploy the file there: 
      mvn deploy:deploy-file -DgroupId=org.apache.maven.shared -DartifactId=maven-shared-io -Dversion=1.1 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

  Path to dependency: 
    1) org.apache.maven.plugins:maven-clean-plugin:maven-plugin:2.2
    2) org.apache.maven.shared:file-management:jar:1.2
    3) org.apache.maven.shared:maven-shared-io:jar:1.1

----------
1 required artifact is missing.

for artifact: 
  org.apache.maven.plugins:maven-clean-plugin:maven-plugin:2.2

from the specified remote repositories:
  apache.snapshots (http://people.apache.org/repo/m2-snapshot-repository),
  igetdb.sourceforge (http://igetdb.sourceforge.net/maven2-repository),
  opennlp.sf.net (http://opennlp.sourceforge.net/maven2/),
  scala-tools.org (http://scala-tools.org/repo-releases/),
  central (http://repo1.maven.org/maven2),
  anonsvn (http://anonsvn.icefaces.org/repo/maven2/releases/),
  maven2-repository.dev.java.net (http://download.java.net/maven/2/),
  java.net-Public (https://maven.java.net/content/groups/public/)

[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4 minutes 17 seconds
[INFO] Finished at: Sat May 19 02:16:19 HKT 2012
[INFO] Final Memory: 15M/119M
[INFO] ------------------------------------------------------------------------
maxjakob commented 12 years ago

OK, this is getting very weird. Now it seems that not even the clean command works.

Can you paste more of the log messages? It's difficult to make out what the problem is from what is there.

I assume you already tried Google. Can you rule out a misconfiguration in your ~/.m2/settings.xml? http://stackoverflow.com/questions/4964704/maven-woes-maven-clean-plugin-not-found-in-repository

jcsahnwaldt commented 12 years ago

The culprit is most likely the dbpedia-2.0-SNAPSHOT-jar-with-dependencies.jar file that is in the git repo. It contains the whole Scala library, version 2.9.0. The slice() method was not in StringOps in that version.

slice() was moved to a base class of StringOps in 2.9.0:

https://github.com/scala/scala/blob/v2.9.0/src/library/scala/collection/immutable/StringOps.scala#L1

In 2.9.1, it was moved back into StringOps:

https://github.com/scala/scala/blob/v2.9.1/src/library/scala/collection/immutable/StringOps.scala#L1

Maven and Intellij apparently use different classpaths to compile and run the Spotlight code.

I'm not sure how to solve this. The dbpedia-2.0-SNAPSHOT-jar-with-dependencies.jar should be much smaller, but I don't know which of its dependencies are actually needed by DBpedia Spotlight.

jcsahnwaldt commented 12 years ago

dbpedia-2.0-SNAPSHOT-jar-with-dependencies.jar is used here: https://github.com/hunterhector/dbpedia-spotlight/blob/master/index/pom.xml#L138

hunterhector commented 12 years ago

OK, I finally got maven running after deleting my local repo. I ran a lot of times to make maven download all the dependencies (Would that be something wrong with my Maven setting?). -- Sorry Jakob, I didn't post the log because every time maven is downloading things I was like blocked out from Internet. It did take this long time until I finally resolved the dependencies again.

But sadly, the error persists that deleting~/.m2/repository and mvn clean install did not solve the issue. The comment that slice() exists in different place for different versions may be related. And my version of Scala is

Scala code runner version 2.9.2 -- Copyright 2002-2011, LAMP/EPFL

And StringOps in 2.9.2 looks the same with 2.9.1 https://github.com/scala/scala/blob/v2.9.2/src/library/scala/collection/immutable/StringOps.scala#L1

However, mvn clean install should mean that I am compiling using Maven, then I run the class with Maven, how come the problem still exists? (If it was caused by the different classpath used by Intellij and Maven)

I am now going to try replacing slice() with something else to see what happen. I will also try to change the version problem if I find a way to.

And by the way, I've already coded a Pig script to count the co-occurrences from the partial file I could get at the moment. I've committed the Pig script under /index/src/pig. I am very new to Pig so that I am not sure the script performs what I want. But please allow me to solve the current issue before I validate my Pig script.

Thanks a lot for the helps, Hector

hunterhector commented 12 years ago

The solution is to remove Scala from the dbpedia-2.0-SNAPSHOT-jar-with-dependencies.jar jar in lib.

Manually removal can be done by (Thanks everyone for finding the solution):

mkdir lib/new-jar; cd lib/new-jar
jar xvf ../dbpedia-2.0-SNAPSHOT-jar-with-dependencies.jar
rm scala -rf
jar cvf ../dbpedia-2.0-SNAPSHOT-jar-with-dependencies.jar *
cd ..; rm new-jar -rf

I've also pushed a removed jar onto the repo.

https://github.com/hunterhector/dbpedia-spotlight/tree/master/lib

By the way, the jar pushed by Pablo on the original repo did not work for me. The jar I produced is 9.3 MB but the jar Pablo pushed is 10.2 MB.

I am now closing this issue.