louismullie / stanford-core-nlp

Ruby bindings to the Stanford Core NLP tools (English, French, German).
Other
432 stars 70 forks source link

Using stanford-corenlp-3.7.0 #47

Closed danbeggan closed 7 years ago

danbeggan commented 7 years ago

I followed the instructions for using a more recent version of stanford-core-nlp setting a custom jar_path & default_jars.

This is the console output of the exception:

/Users/danielbeggan/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/stanford-core-nlp-0.5.3/lib/stanford-core-nlp.rb:187:in `method_missing': Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl (ReflectionLoading$ReflectionLoadingException)
    from /Users/danielbeggan/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/stanford-core-nlp-0.5.3/lib/stanford-core-nlp.rb:187:in `load'
    from main.rb:20:in `<main>'

This is my main.rb file as far as the set pipeline:

StanfordCoreNLP.jvm_args = ['-Xms2G', '-Xmx2G']
StanfordCoreNLP.use :english
StanfordCoreNLP.jar_path = File.dirname(__FILE__) + '/bin/'
StanfordCoreNLP.model_files = {}
StanfordCoreNLP.default_jars = [
  'joda-time.jar',
  'xom.jar',
  'stanford-corenlp-3.7.0.jar',
  'stanford-corenlp-3.7.0-models.jar',
  'jollyday.jar',
  'bridge.jar'
]

text = 'Angela Merkel met Nicolas Sarkozy on January 25th in Berlin to discuss a new austerity package.'
pipeline = StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner, :dcoref)

Any help would be greatly appreciated!

danbeggan commented 7 years ago

Came across this on my searches and got it working by setting the ner.useSUTime property to 0

StanfordCoreNLP.custom_properties = { 'ner.useSUTime' => '0' }
arbox commented 7 years ago

We're still under heavy development. The most recent version which runs smoothly is 3.5.0. Please be patient :) And thank you for this report!

mwlang commented 7 years ago

This is failing for me for 3.5.0 using ruby 2.3.3p222 (2016-11-21 revision 56859) [x86_64-darwin16] and stanford-core-nlp-0.5.3

Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
edu.stanford.nlp.pipeline.AnnotatorImplementations:
Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.5 sec].
Adding annotator lemma
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...done [0.3 sec].
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [3.4 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [3.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [2.1 sec].
sutime.binder.1.
Initializing JollyDayHoliday for sutime with classpath:edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml
/Users/mwlang/.rvm/gems/ruby-2.3.3/gems/stanford-core-nlp-0.5.3/lib/stanford-core-nlp.rb:187:in `method_missing': Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl (ReflectionLoading$ReflectionLoadingException)
    from /Users/mwlang/.rvm/gems/ruby-2.3.3/gems/stanford-core-nlp-0.5.3/lib/stanford-core-nlp.rb:187:in `load'
    from test_350.rb:38:in `<main>'
# Use the model files for a different language than English.
require 'stanford-core-nlp'

StanfordCoreNLP.jar_path = File.expand_path(File.dirname(__FILE__)) + '/350/'
StanfordCoreNLP.model_path = File.expand_path(File.dirname(__FILE__)) + '/350/taggers/'

StanfordCoreNLP.use :french
StanfordCoreNLP.model_files = {}

StanfordCoreNLP.default_jars = [
  'joda-time-2.9.9.jar',
  'jollyday-0.5.1.jar',
  'xom.jar',
  'stanford-corenlp-3.5.0.jar',
  'stanford-corenlp-3.5.0-models.jar',
  'bridge.jar',
]

text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
   'Berlin to discuss a new austerity package. Sarkozy ' +
   'looked pleased, but Merkel was dismayed.'

pipeline =  StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner, :dcoref)
text = StanfordCoreNLP::Annotation.new(text)
pipeline.annotate(text)

text.get(:sentences).each do |sentence|
  # Syntatical dependencies
  puts sentence.get(:basic_dependencies).to_s
  sentence.get(:tokens).each do |token|
    # Default annotations for all tokens
    puts token.get(:value).to_s
    puts token.get(:original_text).to_s
    puts token.get(:character_offset_begin).to_s
    puts token.get(:character_offset_end).to_s
    # POS returned by the tagger
    puts token.get(:part_of_speech).to_s
    # Lemma (base form of the token)
    puts token.get(:lemma).to_s
    # Named entity tag
    puts token.get(:named_entity_tag).to_s
    # Coreference
    puts token.get(:coref_cluster_id).to_s
    # Also of interest: coref, coref_chain,
    # coref_cluster, coref_dest, coref_graph.
  end
end
11:06:26:treat >> ls 350/**
-rwxr-xr-x@ 1 mwlang  staff   5.4K Oct 31  2014 350/CoreNLP-to-HTML.xsl
-rw-rw-r--@ 1 mwlang  staff   777B Oct 31  2014 350/LIBRARY-LICENSES
-rw-r--r--@ 1 mwlang  staff    18K Oct 31  2014 350/LICENSE.txt
-rw-r--r--@ 1 mwlang  staff   769B Oct 31  2014 350/Makefile
-rw-rw-r--@ 1 mwlang  staff   3.0K Oct 31  2014 350/README.txt
-rw-rw-r--@ 1 mwlang  staff   2.2K Oct 31  2014 350/SemgrexDemo.java
-rw-rw-r--@ 1 mwlang  staff   1.6K Oct 31  2014 350/ShiftReduceDemo.java
-rw-rw-r--@ 1 mwlang  staff   2.7K Oct 31  2014 350/StanfordCoreNlpDemo.java
-rw-rw-r--@ 1 mwlang  staff   280K Oct 31  2014 350/StanfordDependenciesManual.pdf
-rw-r--r--@ 1 mwlang  staff   915B Apr 17 23:00 350/bridge.jar
-rw-rw-r--@ 1 mwlang  staff   3.9K Oct 31  2014 350/build.xml
-rwxrwxr-x@ 1 mwlang  staff   652B Oct 31  2014 350/corenlp.sh
-rw-r--r--@ 1 mwlang  staff   1.2M Oct 31  2014 350/ejml-0.23-src.zip
-rw-r--r--@ 1 mwlang  staff   207K Oct 31  2014 350/ejml-0.23.jar
-rw-r--r--@ 1 mwlang  staff    89B Oct 31  2014 350/input.txt
-rw-r--r--@ 1 mwlang  staff    13K Oct 31  2014 350/input.txt.xml
-rw-r--r--@ 1 mwlang  staff    54K Oct 31  2014 350/javax.json-api-1.0-sources.jar
-rw-r--r--@ 1 mwlang  staff    83K Oct 31  2014 350/javax.json.jar
-rw-r--r--@ 1 mwlang  staff   684K Apr 17 23:00 350/joda-time-2.1-sources.jar
-rw-r--r--@ 1 mwlang  staff   756K Apr 17 23:00 350/joda-time-2.9-sources.jar
-rw-r--r--@ 1 mwlang  staff   619K Apr 17 23:00 350/joda-time-2.9.9.jar
-rw-r--r--@ 1 mwlang  staff   615K Apr 17 23:00 350/joda-time.jar
-rw-r--r--@ 1 mwlang  staff   182K Oct 31  2014 350/jollyday-0.4.7-sources.jar
-rw-r--r--@ 1 mwlang  staff   192K Apr 17 23:00 350/jollyday-0.4.9-sources.jar
-rw-r--r--@ 1 mwlang  staff   209K Apr 17 23:00 350/jollyday-0.5.1.jar
-rw-r--r--@ 1 mwlang  staff   209K Apr 17 23:00 350/jollyday.jar
-rw-rw-r--@ 1 mwlang  staff   3.4K Oct 31  2014 350/pom.xml
-rw-rw-r--@ 1 mwlang  staff   7.2M Oct 31  2014 350/stanford-corenlp-3.5.0-javadoc.jar
-rw-rw-r--@ 1 mwlang  staff   232M Oct 31  2014 350/stanford-corenlp-3.5.0-models.jar
-rw-rw-r--@ 1 mwlang  staff   3.7M Oct 31  2014 350/stanford-corenlp-3.5.0-sources.jar
-rw-rw-r--@ 1 mwlang  staff   5.6M Oct 31  2014 350/stanford-corenlp-3.5.0.jar
-rw-r--r--@ 1 mwlang  staff   656K Oct 31  2014 350/xom-1.2.10-src.jar
-rw-r--r--@ 1 mwlang  staff   306K Oct 31  2014 350/xom.jar

350/patterns:
total 168
drwxrwxr-x@ 10 mwlang  staff   340B Oct 31  2014 .
drwxr-xr-x  40 mwlang  staff   1.3K Apr 17 23:00 ..
-rw-rw-r--@  1 mwlang  staff    11K Oct 31  2014 example.properties
-rw-r--r--@  1 mwlang  staff   1.0K Oct 31  2014 goldnames.txt
-rw-r--r--@  1 mwlang  staff    19B Oct 31  2014 goldplaces.txt
-rw-r--r--@  1 mwlang  staff    44B Oct 31  2014 names.txt
-rw-r--r--@  1 mwlang  staff    33B Oct 31  2014 otherpeople.txt
-rw-r--r--@  1 mwlang  staff    24B Oct 31  2014 places.txt
-rw-r--r--@  1 mwlang  staff    45K Oct 31  2014 presidents.txt
-rw-r--r--@  1 mwlang  staff   1.3K Oct 31  2014 stopwords.txt

350/sutime:
total 96
drwxrwxr-x@  5 mwlang  staff   170B Oct 31  2014 .
drwxr-xr-x  40 mwlang  staff   1.3K Apr 17 23:00 ..
-rw-r--r--@  1 mwlang  staff   6.3K Oct 31  2014 defs.sutime.txt
-rw-r--r--@  1 mwlang  staff   1.2K Oct 31  2014 english.holidays.sutime.txt
-rw-r--r--@  1 mwlang  staff    35K Oct 31  2014 english.sutime.txt

350/taggers:
total 21952
drwxr-xr-x  19 mwlang  staff   646B Apr 17 22:58 .
drwxr-xr-x  40 mwlang  staff   1.3K Apr 17 23:00 ..
-rw-r--r--@  1 mwlang  staff    18K Jul 25  2014 LICENSE.txt
-rw-rw-r--@  1 mwlang  staff    11K Oct 26  2014 README.txt
-rw-rw-r--@  1 mwlang  staff   834B Oct 26  2014 TaggerDemo.java
-rw-rw-r--@  1 mwlang  staff   2.2K Oct 26  2014 TaggerDemo2.java
-rw-r--r--@  1 mwlang  staff   6.1K Oct 26  2014 build.xml
drwxrwxr-x@  3 mwlang  staff   102B Oct 26  2014 data
drwxrwxr-x@ 41 mwlang  staff   1.4K Oct 26  2014 models
-rw-r--r--@  1 mwlang  staff   379B Jul 25  2014 sample-input.txt
-rw-r--r--@  1 mwlang  staff   619B Jul 25  2014 sample-output.txt
-rw-rw-r--@  1 mwlang  staff   3.4M Oct 26  2014 stanford-postagger-3.5.0-javadoc.jar
-rw-rw-r--@  1 mwlang  staff   1.9M Oct 26  2014 stanford-postagger-3.5.0-sources.jar
-rw-rw-r--@  1 mwlang  staff   2.7M Oct 26  2014 stanford-postagger-3.5.0.jar
-rw-r--r--@  1 mwlang  staff   155B Jul 25  2014 stanford-postagger-gui.bat
-rwxr-xr-x@  1 mwlang  staff   100B Jul 25  2014 stanford-postagger-gui.sh
-rw-r--r--@  1 mwlang  staff   242B Jul 25  2014 stanford-postagger.bat
-rw-rw-r--@  1 mwlang  staff   2.7M Oct 26  2014 stanford-postagger.jar
-rwxr-xr-x@  1 mwlang  staff   262B Jul 25  2014 stanford-postagger.sh

350/tokensregex:
total 32
drwxrwxr-x@  6 mwlang  staff   204B Oct 31  2014 .
drwxr-xr-x  40 mwlang  staff   1.3K Apr 17 23:00 ..
-rw-r--r--@  1 mwlang  staff    42B Oct 31  2014 color.input.txt
-rw-r--r--@  1 mwlang  staff   103B Oct 31  2014 color.properties
-rw-r--r--@  1 mwlang  staff   1.3K Oct 31  2014 color.rules.txt
-rw-r--r--@  1 mwlang  staff   1.0K Oct 31  2014 retokenize.txt

350/zips:
total 765704
drwxr-xr-x   4 mwlang  staff   136B Apr 17 22:57 .
drwxr-xr-x  40 mwlang  staff   1.3K Apr 17 23:00 ..
-rw-r--r--@  1 mwlang  staff   251M Apr 17 22:56 stanford-corenlp-full-2014-10-31.zip
-rw-r--r--@  1 mwlang  staff   122M Apr 17 22:56 stanford-postagger-full-2014-10-26.zip

Any ideas? I don't want to turn off the NER for date/time parsing because this exactly the functionality I'm trying to gain access to to try NLP with my Ruby project.

FWIW, I get same error with 3.7.0 as with above 3.5.0.