louismullie / stanford-core-nlp

Ruby bindings to the Stanford Core NLP tools (English, French, German).
Other
432 stars 70 forks source link

Missing files #5

Closed MarkMT closed 12 years ago

MarkMT commented 12 years ago

I am having trouble getting this gem working using the guidance in the README. Details...

I. When I try to load the pipeline -

pipeline = StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner, :dcoref)

I get an error message complaining that ".../stanford-core-nlp-0.3.2/bin/dcoref/unknown.txt could not be found. You may need to download this file manually and/or set paths properly."

The cause seems to be the following in config.rb -


Models = {
  :dcoref => {
    :english => {
      'countries' => 'unknown.txt',          # Fix - can somebody provide this file?
      'states.provinces' => 'unknown.txt',   # Fix - can somebody provide this file?
    },
  }

Commenting out the two entries in the inner hash above seems to overcome this problem.

II. Attempting to load the pipeline now results in a error message complaining that ".../stanford-core-nlp-0.3.2/bin/classifiers/muc.7class.distsim.crf.ser.gz could not be found. You may need to download this file manually and/or set paths properly."

This seems to reflect two problems. First, the directory 'classifiers' is not present in any of the .zip files linked to in the README. They instead appear to use the directory 'ner'. Secondly, the files within the 'ner' directory are prefixed with 'english.', but this prefix is missing from the files specified in config.rb.

Changing the directory name and add the prefix in config.rb seems to overcome this problem.

III. With the changes above, I now get the following error message when I try to load the pipeline -


Adding annotator dcoref ERROR: cannot create DeterministicCorefAnnotator! edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to resolve "edu/stanford/nlp/models/dcoref/countries" as either class path, filename or URL at edu.stanford.nlp.dcoref.Dictionaries.loadCountriesLists(Dictionaries.java:232) at edu.stanford.nlp.dcoref.Dictionaries.(Dictionaries.java:312) at edu.stanford.nlp.dcoref.Dictionaries.(Dictionaries.java:274) at edu.stanford.nlp.dcoref.SieveCoreferenceSystem.(SieveCoreferenceSystem.java:222) at edu.stanford.nlp.pipeline.DeterministicCorefAnnotator.(DeterministicCorefAnnotator.java:51) at edu.stanford.nlp.pipeline.StanfordCoreNLP$13.create(StanfordCoreNLP.java:632) at edu.stanford.nlp.pipeline.StanfordCoreNLP$13.create(StanfordCoreNLP.java:629) at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:62) at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:329) at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:196) at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:186) at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:178) Caused by: java.io.IOException: Unable to resolve "edu/stanford/nlp/models/dcoref/countries" as either class path, filename or URL at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:296) at edu.stanford.nlp.dcoref.Dictionaries.loadCountriesLists(Dictionaries.java:225) ... 11 more RuntimeException: edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to resolve "edu/stanford/nlp/models/dcoref/countries" as either class path, filename or URL from /home/mark/.rvm/gems/ruby-1.8.7-head@rails3test/gems/stanford-core-nlp-0.3.2/lib/stanford-core-nlp.rb:150:in new' from /home/mark/.rvm/gems/ruby-1.8.7-head@rails3test/gems/stanford-core-nlp-0.3.2/lib/stanford-core-nlp.rb:150:inload' from (irb):4


I can't find 'edu/stanford/nlp/models/dcoref/countries' in any of the jar's in the 'bin' directory. I don't know how to fix this problem.

louismullie commented 12 years ago

Hi Mark,

Thanks for taking the time to report this.

All these problems are due to me improperly formatting the directory layout and file names in the most recent release of the ZIP files.

I'll update them ASAP - possibly today, at the latest Saturday.

Thanks again, Louis

louismullie commented 12 years ago

By the way, a quick fix:

  1. Rename the ner directory to classifiers and remove the english. prefixes from all files.
  2. Create an empty file named unknown.txt in dcoref.
  3. Don't make any changes to the code.
MarkMT commented 12 years ago

Thanks Louis! That works. I should have figured that out!

louismullie commented 12 years ago

Ok, updated the files. Thanks again for reporting!

MarkMT commented 12 years ago

Thanks, cheers.

On 06/18/2012 10:29 AM, Louis Mullie wrote:

Ok, updated the files. Thanks again for reporting!


Reply to this email directly or view it on GitHub: https://github.com/louismullie/stanford-core-nlp/issues/5#issuecomment-6399295