stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.63k stars 2.7k forks source link

NullPointerException #969

Closed doug919 closed 4 years ago

doug919 commented 4 years ago

Describe the bug I use CoreNLP 3.9.2 as a remote server with StanfordNLP 0.2.0. I want to work on pre-tokenized inputs by specifying the following properties

properties = {
        'tokenize.whitespace': True,
        'tokenize.keepeol': True,
        'ssplit.eolonly': True
        }

I received the following NullPointer error: Client-side

starting up Java Stanford CoreNLP Server...
Starting server with command: java -Xmx16G -cp /myhome/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-5074c5d23e934dbf.props -preload tokenize,ssplit,pos,lemma,ner,parse,depparse,coref
Traceback (most recent call last):
  File "/myhome/.local/lib/python3.6/site-packages/stanfordnlp/server/client.py", line 330, in _request
    r.raise_for_status()
  File "/myhome/github_repo/requests/requests/models.py", line 840, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%27outputFormat%27%3A+%27serialized%27%7D

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 24, in <module>
    ann = client.annotate(text)
  File "/myhome/.local/lib/python3.6/site-packages/stanfordnlp/server/client.py", line 398, in annotate
    r = self._request(text.encode('utf-8'), request_properties, **kwargs)
  File "/myhome/.local/lib/python3.6/site-packages/stanfordnlp/server/client.py", line 336, in _request
    raise AnnotationException(r.text)
stanfordnlp.server.client.AnnotationException: java.util.concurrent.ExecutionException: java.lang.NullPointerException

Server-side

[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
java.util.concurrent.ExecutionException: java.lang.NullPointerException
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
        at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:870)
        at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
        at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
        at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80)
        at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:692)
        at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77)
        at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:664)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.NullPointerException
        at edu.stanford.nlp.pipeline.NERCombinerAnnotator.annotate(NERCombinerAnnotator.java:322)
        at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:76)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:637)
        at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.lambda$handle$0(StanfordCoreNLPServer.java:857)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        ... 3 more

To Reproduce Run the following example, which is modified from the document example by adding the properties and change the input texts with EOL.

from stanfordnlp.server import CoreNLPClient

# example text
print('---')
print('input text')
print('')

text = "Chris Manning is a nice person.\nChris wrote a simple sentence. He also gives oranges to people.\ntest it."

print(text)

# set up the client
print('---')
print('starting up Java Stanford CoreNLP Server...')

# set up the client
properties = {
        'tokenize.whitespace': True,
        'tokenize.keepeol': True,
        'ssplit.eolonly': True
        }
with CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'], timeout=30000, memory='16G', properties=properties) as client:
    # submit the request to the server
    ann = client.annotate(text)

    # get the first sentence
    sentence = ann.sentence[0]

    # get the constituency parse of the first sentence
    print('---')
    print('constituency parse of first sentence')
    constituency_parse = sentence.parseTree
    print(constituency_parse)

    # get the first subtree of the constituency parse
    print('---')
    print('first subtree of constituency parse')
    print(constituency_parse.child[0])

    # get the value of the first subtree
    print('---')
    print('value of first subtree of constituency parse')
    print(constituency_parse.child[0].value)

    # get the dependency parse of the first sentence
    print('---')
    print('dependency parse of first sentence')
    dependency_parse = sentence.basicDependencies
    print(dependency_parse)

    # get the first token of the first sentence
    print('---')
    print('first token of first sentence')
    token = sentence.token[0]
    print(token)

    # get the part-of-speech tag
    print('---')
    print('part of speech tag of token')
    token.pos
    print(token.pos)

    # get the named entity tag
    print('---')
    print('named entity tag of token')
    print(token.ner)

    # get an entity mention from the first sentence
    print('---')
    print('first entity mention in sentence')
    print(sentence.mentions[0])

    # access the coref chain
    print('---')
    print('coref chains for the example')
    print(ann.corefChain)

    # Use tokensregex patterns to find who wrote a sentence.
    pattern = '([ner: PERSON]+) /wrote/ /an?/ []{0,3} /sentence|article/'
    matches = client.tokensregex(text, pattern)
    # sentences contains a list with matches for each sentence.
    assert len(matches["sentences"]) == 3
    # length tells you whether or not there are any matches in this
    assert matches["sentences"][1]["length"] == 1
    # You can access matches like most regex groups.
    matches["sentences"][1]["0"]["text"] == "Chris wrote a simple sentence"
    matches["sentences"][1]["0"]["1"]["text"] == "Chris"

    # Use semgrex patterns to directly find who wrote what.
    pattern = '{word:wrote} >nsubj {}=subject >dobj {}=object'
    matches = client.semgrex(text, pattern)
    # sentences contains a list with matches for each sentence.
    assert len(matches["sentences"]) == 3
    # length tells you whether or not there are any matches in this
    assert matches["sentences"][1]["length"] == 1
    # You can access matches like most regex groups.
    matches["sentences"][1]["0"]["text"] == "wrote"
    matches["sentences"][1]["0"]["$subject"]["text"] == "Chris"
    matches["sentences"][1]["0"]["$object"]["text"] == "sentence"

Expected behavior Serialized parsing results

Environment (please complete the following information):

Additional context CoreNLP 3.9.1 doesn't have this issue.

AngledLuffa commented 4 years ago

I believe this is now fixed if you use the latest versions of stanza and corenlp.