louismullie / stanford-core-nlp

Ruby bindings to the Stanford Core NLP tools (English, French, German).
Other
433 stars 70 forks source link

GC Overhead limit exceeded when trying to run example code #31

Closed chelsea closed 9 years ago

chelsea commented 9 years ago

I'm just trying to get all my config set up properly, but can't get the example code to run.

In my app, which is a simple little Sinatra app with a single endpoint that just executes the following:

  text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
   'Berlin to discuss a new austerity package. Sarkozy ' +
   'looked pleased, but Merkel was dismayed.'
  pipeline =  StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner)
  text = StanfordCoreNLP::Annotation.new(text)
  return pipeline.annotate(text).to_json

And when the method is invoked, I get the following:

Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
edu.stanford.nlp.pipeline.AnnotatorImplementations:
Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.0 sec].
Adding annotator lemma
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...done [1.6 sec].
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [5.2 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... Java::JavaLang::OutOfMemoryError - GC overhead limit exceeded:
    java.util.Arrays.copyOf(java/util/Arrays.java:3332)
    java.lang.AbstractStringBuilder.expandCapacity(java/lang/AbstractStringBuilder.java:137)
    java.lang.AbstractStringBuilder.ensureCapacityInternal(java/lang/AbstractStringBuilder.java:121)
    java.lang.AbstractStringBuilder.append(java/lang/AbstractStringBuilder.java:569)
    java.lang.StringBuilder.append(java/lang/StringBuilder.java:190)
    java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(java/io/ObjectInputStream.java:3147)
    java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(java/io/ObjectInputStream.java:3055)
    java.io.ObjectInputStream$BlockDataInputStream.readUTF(java/io/ObjectInputStream.java:2867)
    java.io.ObjectInputStream.readString(java/io/ObjectInputStream.java:1639)
    java.io.ObjectInputStream.readObject0(java/io/ObjectInputStream.java:1342)
    java.io.ObjectInputStream.readObject(java/io/ObjectInputStream.java:371)
    java.util.HashMap.readObject(java/util/HashMap.java:1394)
    java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:483)
    java.io.ObjectStreamClass.invokeReadObject(java/io/ObjectStreamClass.java:1017)
    java.io.ObjectInputStream.readSerialData(java/io/ObjectInputStream.java:1896)
    java.io.ObjectInputStream.readOrdinaryObject(java/io/ObjectInputStream.java:1801)
    java.io.ObjectInputStream.readObject0(java/io/ObjectInputStream.java:1351)
    java.io.ObjectInputStream.defaultReadFields(java/io/ObjectInputStream.java:1993)
    java.io.ObjectInputStream.readSerialData(java/io/ObjectInputStream.java:1918)
    java.io.ObjectInputStream.readOrdinaryObject(java/io/ObjectInputStream.java:1801)
    java.io.ObjectInputStream.readObject0(java/io/ObjectInputStream.java:1351)
    java.io.ObjectInputStream.readObject(java/io/ObjectInputStream.java:371)
    edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(edu/stanford/nlp/ie/crf/CRFClassifier.java:2607)
    edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(edu/stanford/nlp/ie/AbstractSequenceClassifier.java:1666)
    edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(edu/stanford/nlp/ie/AbstractSequenceClassifier.java:1721)
    edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(edu/stanford/nlp/ie/AbstractSequenceClassifier.java:1708)
    edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(edu/stanford/nlp/ie/crf/CRFClassifier.java:2836)
    edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(edu/stanford/nlp/ie/ClassifierCombiner.java:189)
    edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(edu/stanford/nlp/ie/ClassifierCombiner.java:173)
    edu.stanford.nlp.ie.ClassifierCombiner.<init>(edu/stanford/nlp/ie/ClassifierCombiner.java:113)

I'm using the latest versions of Stanford NLP (3.5.0) downloaded into a lib directory, loaded with the following, the latest tagger version is also in lib/taggers:

StanfordCoreNLP.jar_path = File.join(File.dirname(__FILE__), 'lib/')
StanfordCoreNLP.model_path = File.join(File.dirname(__FILE__), 'lib/')
StanfordCoreNLP.use :english
StanfordCoreNLP.model_files = {}
StanfordCoreNLP.default_jars = [
  'joda-time.jar',
  'xom.jar',
  'stanford-corenlp-3.5.0.jar',
  'stanford-corenlp-3.5.0-models.jar',
  'jollyday.jar',
  'bridge.jar'
]

And the following Java and jruby set up:

$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
$ javac -version
javac 1.8.0_25
$ ruby --version
jruby 1.7.13 (1.9.3p392) 2014-06-24 43f133c on Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17 +indy [darwin-x86_64]

I've tried playing around with the StanfordCoreNLP.jvm_args option, bumping it up to as much as 4GB but still have the same issue.

Any ideas how to get this sorted?

chelsea commented 9 years ago

Looks like it works running with:

jruby -J-Xmx2048m app.rb

I was able to get the following error which makes it look like the jvm options that I'm manually setting, and the default ones from the gem are not being respected in my setup:

Error: Your application used more memory than the safety cap of 500M.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace

Exporting JAVA_OPTS to the env seems to take care of this:

web: export JAVA_OPTS=-"Xmx3g"; bundle exec #{run your app}
Demetrio92 commented 6 years ago

Sorry for necroposting. Google shows this thread on the top for request "stanford core nlp GC overhead limit exceeded". The example from the Stanford Core NLP webpage is misconfigured.

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file input.txt

has an option -Xmx2g which approximately means "use 2GB of RAM`. Increasing it to 4 GB solved the issue.