NullPointerException #969

Closed doug919 closed 4 years ago

doug919 commented 4 years ago

Describe the bug I use CoreNLP 3.9.2 as a remote server with StanfordNLP 0.2.0. I want to work on pre-tokenized inputs by specifying the following properties

properties = {
        'tokenize.whitespace': True,
        'tokenize.keepeol': True,
        'ssplit.eolonly': True

I received the following NullPointer error: Client-side

starting up Java Stanford CoreNLP Server...
Starting server with command: java -Xmx16G -cp /myhome/stanford-corenlp-full-2018-10-05/* edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties corenlp_server-5074c5d23e934dbf.props -preload tokenize,ssplit,pos,lemma,ner,parse,depparse,coref
Traceback (most recent call last):
  File "/myhome/.local/lib/python3.6/site-packages/stanfordnlp/server/", line 330, in _request
  File "/myhome/github_repo/requests/requests/", line 840, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%27outputFormat%27%3A+%27serialized%27%7D

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "", line 24, in <module>
    ann = client.annotate(text)
  File "/myhome/.local/lib/python3.6/site-packages/stanfordnlp/server/", line 398, in annotate
    r = self._request(text.encode('utf-8'), request_properties, **kwargs)
  File "/myhome/.local/lib/python3.6/site-packages/stanfordnlp/server/", line 336, in _request
    raise AnnotationException(r.text)
stanfordnlp.server.client.AnnotationException: java.util.concurrent.ExecutionException: java.lang.NullPointerException


[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref
java.util.concurrent.ExecutionException: java.lang.NullPointerException
        at java.base/
        at java.base/java.util.concurrent.FutureTask.get(
        at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(
        at jdk.httpserver/$Chain.doFilter(
        at jdk.httpserver/
        at jdk.httpserver/$Chain.doFilter(
        at jdk.httpserver/$Exchange$LinkHandler.handle(
        at jdk.httpserver/$Chain.doFilter(
        at jdk.httpserver/$
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.base/java.util.concurrent.ThreadPoolExecutor$
        at java.base/
Caused by: java.lang.NullPointerException
        at edu.stanford.nlp.pipeline.NERCombinerAnnotator.annotate(
        at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(
        at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.lambda$handle$0(
        at java.base/
        ... 3 more

To Reproduce Run the following example, which is modified from the document example by adding the properties and change the input texts with EOL.

from stanfordnlp.server import CoreNLPClient

# example text
print('input text')

text = "Chris Manning is a nice person.\nChris wrote a simple sentence. He also gives oranges to people.\ntest it."


# set up the client
print('starting up Java Stanford CoreNLP Server...')

# set up the client
properties = {
        'tokenize.whitespace': True,
        'tokenize.keepeol': True,
        'ssplit.eolonly': True
with CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'], timeout=30000, memory='16G', properties=properties) as client:
    # submit the request to the server
    ann = client.annotate(text)

    # get the first sentence
    sentence = ann.sentence[0]

    # get the constituency parse of the first sentence
    print('constituency parse of first sentence')
    constituency_parse = sentence.parseTree

    # get the first subtree of the constituency parse
    print('first subtree of constituency parse')

    # get the value of the first subtree
    print('value of first subtree of constituency parse')

    # get the dependency parse of the first sentence
    print('dependency parse of first sentence')
    dependency_parse = sentence.basicDependencies

    # get the first token of the first sentence
    print('first token of first sentence')
    token = sentence.token[0]

    # get the part-of-speech tag
    print('part of speech tag of token')

    # get the named entity tag
    print('named entity tag of token')

    # get an entity mention from the first sentence
    print('first entity mention in sentence')

    # access the coref chain
    print('coref chains for the example')

    # Use tokensregex patterns to find who wrote a sentence.
    pattern = '([ner: PERSON]+) /wrote/ /an?/ []{0,3} /sentence|article/'
    matches = client.tokensregex(text, pattern)
    # sentences contains a list with matches for each sentence.
    assert len(matches["sentences"]) == 3
    # length tells you whether or not there are any matches in this
    assert matches["sentences"][1]["length"] == 1
    # You can access matches like most regex groups.
    matches["sentences"][1]["0"]["text"] == "Chris wrote a simple sentence"
    matches["sentences"][1]["0"]["1"]["text"] == "Chris"

    # Use semgrex patterns to directly find who wrote what.
    pattern = '{word:wrote} >nsubj {}=subject >dobj {}=object'
    matches = client.semgrex(text, pattern)
    # sentences contains a list with matches for each sentence.
    assert len(matches["sentences"]) == 3
    # length tells you whether or not there are any matches in this
    assert matches["sentences"][1]["length"] == 1
    # You can access matches like most regex groups.
    matches["sentences"][1]["0"]["text"] == "wrote"
    matches["sentences"][1]["0"]["$subject"]["text"] == "Chris"
    matches["sentences"][1]["0"]["$object"]["text"] == "sentence"

Expected behavior Serialized parsing results

Environment (please complete the following information):

Additional context CoreNLP 3.9.1 doesn't have this issue.

AngledLuffa commented 4 years ago

I believe this is now fixed if you use the latest versions of stanza and corenlp.