stanfordnlp / stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
https://stanfordnlp.github.io/stanza/
Other
7.28k stars 893 forks source link

Improve error message for missing annotators #105

Open ghost opened 5 years ago

ghost commented 5 years ago

I have the following MWE:

from stanfordnlp.server import CoreNLPClient
text = 'Barack Obama was born in the Hawaii. He was the president of the United States. '
prop = {'annotators': 'coref', 'coref.algorithm' : 'neural'}
with CoreNLPClient(properties=prop, timeout=60000, memory='16G', quietsss=False) as client:
    ann = client.annotate(text)    

with the variable CORENLP_HOME properly defined. But the code crashes:

D:\data\progetti_miei\corenlp_coref\stanfordnlp_official>python test_mine_bugreport.py
Starting server with command: java -Xmx16G -cp D:\data\programmi\StanfordCoreNLP\stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 60000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-fd10ce87f3414e2b.props -preload coref
Traceback (most recent call last):
  File "D:\data\programmi\Python37\lib\site-packages\stanfordnlp\server\client.py", line 330, in _request
    r.raise_for_status()
  File "D:\data\programmi\Python37\lib\site-packages\requests\models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%27outputFormat%27%3A+%27serialized%27%7D

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_mine_bugreport.py", line 5, in <module>
    ann = client.annotate(text)
  File "D:\data\programmi\Python37\lib\site-packages\stanfordnlp\server\client.py", line 398, in annotate
    r = self._request(text.encode('utf-8'), request_properties, **kwargs)
  File "D:\data\programmi\Python37\lib\site-packages\stanfordnlp\server\client.py", line 336, in _request
    raise AnnotationException(r.text)
stanfordnlp.server.client.AnnotationException: java.util.concurrent.ExecutionException: java.lang.NullPointerException

If I use the statistical processor instead of the neural one, the code works as exepcted.

ghost commented 5 years ago

It was due to missing annotators which are required by the coref annotator. In fact running the equivalent in CoreNLP gives:

java -Xmx5g -cp stanford-corenlp-3.9.2.jar;stanford-corenlp-3.9.2-models.jar;* edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators coref -coref.algorithm neural -file example_file.txt
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
[main] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref model edu/stanford/nlp/models/coref/neural/english-model-default.ser.gz ... done [0.4 sec].
[main] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref embeddings edu/stanford/nlp/models/coref/neural/english-embeddings.ser.gz ... done [0.4 sec].
[main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: rule
Exception in thread "main" java.lang.IllegalArgumentException: annotator "coref" requires annotation "BasicDependenciesAnnotation". The usual requirements for this annotator are: tokenize,ssplit,pos,lemma,ner,depparse
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:260)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:192)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:188)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1388)

but the error message is much more useful, as it directly tells you what is the problem and which annotators are missing. While the Python wrapper just gives a more obscure NullPointerException