stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.66k stars 2.7k forks source link

How to use the neural coref model #796

Open panda0881 opened 5 years ago

panda0881 commented 5 years ago

I was trying to use the coreference models as baselines. It went well with the 'dcoref' and 'statistical' coref model. But when I tried to use the 'neural' model, I received this bug:

[pool-1-thread-2] INFO CoreNLP - [/127.0.0.1:55494] API call w/annotators tokenize,ssplit,pos,lemma,ner,depparse,coref
Uh-huh . It happened that I was going to have lunch with a friend , um , at noon . And then , the friend first sent me an SMS , Uh-huh . saying he would come pick me up to go together .
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-2] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.6 sec].
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-2] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.3 sec].
[pool-1-thread-2] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.6 sec].
[pool-1-thread-2] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.6 sec].
[pool-1-thread-2] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
[pool-1-thread-2] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580704 unique entries out of 581863 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4869 unique entries out of 4869 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585573 unique entries from 2 files
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-2] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[pool-1-thread-2] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 8.211 (s)
[pool-1-thread-2] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [9.4 sec].
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
[pool-1-thread-2] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref model edu/stanford/nlp/models/coref/neural/english-model-default.ser.gz ... done [0.4 sec].
[pool-1-thread-2] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref embeddings edu/stanford/nlp/models/coref/neural/english-embeddings.ser.gz ... done [0.4 sec].
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Error with building coref mention annotator!
[pool-1-thread-2] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - java.lang.ClassNotFoundException: edu.stanford.nlp.hcoref.md.MentionDetectionClassifier

For your information, I am using the latest version (3.9.2) and I'm using the stanfordCoreNLP(https://github.com/Lynten/stanford-corenlp) to call the server.

nlp = StanfordCoreNLP(r'/Users/stanford-corenlp-full-2018-10-05/', quiet=False)
props = {'annotators': 'coref', 'coref.algorithm': 'neural', 'pipelineLanguage': 'en'}

Can anyone help? Thank you very much!

J38 commented 5 years ago

I think the major issue here is that the neural coref does not work with dependency parses only (it was not trained to work with dependency based mention detection), and you have to run the parse annotator (which is unfortunately slow). It may be sufficient to make that change in your annotators list (instead of just having 'annotators': 'coref' change to 'annotators': 'tokenize,ssplit,pos,lemma,ner,parse,coref'.

You can go to the Python library's instructions and just start the server independent of your Python process the way it says to run an existing server.

Add -serverProperties neural-coref.props to the Java command.

Put these settings in neural-coref.props:

annotators = tokenize, ssplit, pos, lemma, ner, parse, coref
coref.algorithm = neural

Then call with no special properties in the request.

There are of course numerous ways to resolve this situation.

Also, I do think you've uncovered a bug with dependency mention detection models we are releasing not being updated to work with current code. I'll try to add clearer documentation about what options are possible with the coref annotators.

panda0881 commented 5 years ago

Thanks for the reply. Calling the server with a python API is still not working, but I successfully solved that problem by firstly creating a bunch of documents and then directly use the java command to parse them. So I think the problem might be with the support of the server to the coreference package rather than the coreference package itself.