louismullie / stanford-core-nlp

Ruby bindings to the Stanford Core NLP tools (English, French, German).
Other
433 stars 70 forks source link

Pipe breaks at ner #15

Closed ghost closed 11 years ago

ghost commented 11 years ago

After fixing https://github.com/Organiz3r/stanford-core-nlp/commit/b47790389eea4a6ae7e6628873494c1be94caa33 :

When trying:

pipeline = StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner)

This happens:

Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Loading default properties from tagger /Users/thomas/.rvm/gems/ruby-1.9.3-p327@core-nlp/gems/stanford-core-nlp-0.4.1/bin/taggers/english-left3words-distsim.tagger
Reading POS tagger model from /Users/thomas/.rvm/gems/ruby-1.9.3-p327@core-nlp/gems/stanford-core-nlp-0.4.1/bin/taggers/english-left3words-distsim.tagger ... done [1.8 sec].
Adding annotator lemma
Adding annotator parse
Loading parser from serialized file /Users/thomas/.rvm/gems/ruby-1.9.3-p327@core-nlp/gems/stanford-core-nlp-0.4.1/bin/grammar/englishPCFG.ser.gz ... done [1.7 sec].
Adding annotator ner
Loading classifier from /Users/thomas/edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... java.io.FileNotFoundException: edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:120)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1651)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1598)
        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1581)
        at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:3024)
        at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:132)
        at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:116)
        at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:98)
        at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:64)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP$6.create(StanfordCoreNLP.java:585)
        at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:80)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:301)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:145)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:141)
Loading classifier from /Users/thomas/edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... /Users/thomas/.rvm/gems/ruby-1.9.3-p327@core-nlp/gems/stanford-core-nlp-0.4.1/lib/stanford-core-nlp.rb:165:in `new': java.io.FileNotFoundException (RuntimeException)
        from /Users/thomas/.rvm/gems/ruby-1.9.3-p327@core-nlp/gems/stanford-core-nlp-0.4.1/lib/stanford-core-nlp.rb:165:in `load'
        from /Users/thomas/meuk.rb:7:in `<main>'

Apparantly it looks for the models in the current directory (I was in /User/thomas) + edu/stanford/nlp/models/ner. I have no idea why it's doing that, since it is able to find all the other CoreNLP files just fine, as evidenced the above output. It should be looking for the proper files in {gem path}/bin/classifiers/, I think.

louismullie commented 11 years ago

8b728cbc8f227c0e0eba2bc641a2ffcdc59d226b 334a2d48edd59f954648a04fc1d980afab3bcaff

louismullie commented 11 years ago

Alright, so these two last commits fix the bug that occurred due to the Stanford NLP team breaking their NER configuration API in the last version of their software. Now, it looks like there's a configuration problem with the SUTime package, which I don't know how to fix. I'm awaiting a response from their team.

ghost commented 11 years ago

:+1:

louismullie commented 11 years ago

e64600df12d4ccf04930d5dc7051e74cb665454d 8cf1b51a7ef188381dd58395e3af7a9de231657f

louismullie commented 11 years ago

Alright, everything is fixed now. I added some specs to make sure this doesn't break in the future, and pushed the version with the fixes as 0.4.3. You'll need to download this JAR file and put it in the /bin folder (it's now included in the latest packages).

Please note that from 0.4.3 on, JRuby 1.6.* is no longer supported. If you were using JRuby, you'll need to upgrade to 1.7.1.

ghost commented 11 years ago

Excellent, thanks for the quick response!

ghost commented 11 years ago

OK, for me it works with the minimal English zip, but not with the full zip. It breaks at :parse. The minimal zip is fine for me, but I thought you'd might like to know and that this might be the best place to report it.

pipeline = StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner, :dcoref)

Adding annotator tokenize
Adding annotator ssplit
Adding annotator pos
Adding annotator lemma
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
java.io.IOException: Unable to resolve "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz" as either class path, filename or URL
    at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:408)
    at edu.stanford.nlp.io.IOUtils.readStreamFromString(IOUtils.java:356)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromSerializedFile(LexicalizedParser.java:530)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromFile(LexicalizedParser.java:328)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:148)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:134)
    at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:147)
    at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:94)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$12.create(StanfordCoreNLP.java:777)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:80)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:301)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:145)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:141)
Loading parser from text file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz java.io.IOException: Unable to resolve "edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz" as either class path, filename or URL
    at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:408)
    at edu.stanford.nlp.io.IOUtils.readReaderFromString(IOUtils.java:427)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromTextFile(LexicalizedParser.java:464)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.getParserFromFile(LexicalizedParser.java:330)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:148)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:134)
    at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:147)
    at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:94)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP$12.create(StanfordCoreNLP.java:777)
    at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:80)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:301)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:145)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:141)
NullPointerException: unknown exception
    from /Users/thomas/.rvm/gems/ruby-1.9.3-p327@core-nlp/gems/stanford-core-nlp-0.4.3/lib/stanford-core-nlp.rb:176:in `new'
    from /Users/thomas/.rvm/gems/ruby-1.9.3-p327@core-nlp/gems/stanford-core-nlp-0.4.3/lib/stanford-core-nlp.rb:176:in `load'
    from (irb):11
    from /Users/thomas/.rvm/rubies/ruby-1.9.3-p327/bin/irb:18:in `<main>'
louismullie commented 11 years ago

Now fixed. Thanks for reporting.

nbrustein commented 10 years ago

I just tried to follow the instructions in the readme and hit something that looks like this same issue. I extracted the full zip and pasted into the gem's bin. Then I copied the code from the "Using the gem" section in the readme and ran it. Using ruby 1.9.3. My error is:

Loading classifier from /Users/nbrustein/code/german/edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... java.io.FileNotFoundException: edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz (No such file or directory)

Here is what my bin/ dir looks like. Am I doing something wrong?

ls /Users/nbrustein/.rvm/gems/ruby-1.9.3-p545@german/gems/stanford-core-nlp-0.5.1/bin total 20656 drwxr-xr-x 18 nbrustein staff 612B Aug 31 12:01 . drwxr-xr-x 6 nbrustein staff 204B Aug 31 11:53 .. -rw-r--r-- 1 nbrustein staff 851B Aug 31 11:53 AnnotationBridge.java -rw-r--r--@ 1 nbrustein staff 915B Aug 31 12:01 bridge.jar drwxr-xr-x@ 8 nbrustein staff 272B Aug 31 12:01 classifiers drwxr-xr-x@ 16 nbrustein staff 544B Aug 31 12:01 dcoref drwxr-xr-x@ 3 nbrustein staff 102B Aug 31 12:01 gender drwxr-xr-x@ 16 nbrustein staff 544B Aug 31 12:01 grammar -rw-r--r--@ 1 nbrustein staff 557K Aug 31 12:01 joda-time.jar -rw-r--r--@ 1 nbrustein staff 196K Aug 31 12:01 jollyday.jar drwxr-xr-x@ 3 nbrustein staff 102B Aug 31 12:01 regexner -rw-r--r--@ 1 nbrustein staff 4.2M Aug 31 12:01 stanford-corenlp.jar -rw-r--r--@ 1 nbrustein staff 2.4M Aug 31 12:01 stanford-parser.jar -rw-r--r--@ 1 nbrustein staff 2.4M Aug 31 12:01 stanford-segmenter.jar drwxr-xr-x@ 7 nbrustein staff 238B Aug 31 12:01 sutime drwxr-xr-x@ 33 nbrustein staff 1.1K Aug 31 12:01 taggers drwxr-xr-x@ 5 nbrustein staff 170B Aug 31 12:01 truecase -rw-r--r--@ 1 nbrustein staff 306K Aug 31 12:01 xom.jar