stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.69k stars 2.7k forks source link

NullPointerException on sentence. #68

Closed paul-english closed 9 years ago

paul-english commented 9 years ago

I'm getting a NullPointerException when trying to parse "People are known to be at or near places" through the shell.

stanford-corenlp-full-2015-04-20>  ./corenlp.sh -threads 4 -annotators tokenize,ssplit,pos,lemma,ner,parse
java -mx3g -cp "./*" edu.stanford.nlp.pipeline.StanfordCoreNLP -threads 4 -annotators tokenize,ssplit,pos,lemma,ner,parse
Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.6 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [4.7 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [2.9 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [2.3 sec].
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.7 sec].

Entering interactive shell. Type q RETURN or EOF to quit.
NLP> People are known to be at or near places
Exception in thread "main" java.lang.NullPointerException
    at edu.stanford.nlp.semgraph.semgrex.GraphRelation$GOVERNER$1.advance(GraphRelation.java:271)
    at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.initialize(GraphRelation.java:929)
    at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.<init>(GraphRelation.java:910)
    at edu.stanford.nlp.semgraph.semgrex.GraphRelation$GOVERNER$1.<init>(GraphRelation.java:257)
    at edu.stanford.nlp.semgraph.semgrex.GraphRelation$GOVERNER.searchNodeIterator(GraphRelation.java:257)
    at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChildIter(NodePattern.java:273)
    at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.resetChildIter(SemgrexMatcher.java:76)
    at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.resetChildIter(CoordinationPattern.java:170)
    at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChild(NodePattern.java:297)
    at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.goToNextNodeMatch(NodePattern.java:394)
    at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.matches(NodePattern.java:511)
    at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.find(SemgrexMatcher.java:155)
    at edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructure.addCaseMarkerInformation(UniversalEnglishGrammaticalStructure.java:225)
    at edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructure.collapseDependencies(UniversalEnglishGrammaticalStructure.java:817)
    at edu.stanford.nlp.trees.GrammaticalStructure.typedDependenciesCollapsed(GrammaticalStructure.java:877)
    at edu.stanford.nlp.semgraph.SemanticGraphFactory.makeFromTree(SemanticGraphFactory.java:188)
    at edu.stanford.nlp.semgraph.SemanticGraphFactory.generateCollapsedDependencies(SemanticGraphFactory.java:90)
    at edu.stanford.nlp.pipeline.ParserAnnotatorUtils.fillInParseAnnotations(ParserAnnotatorUtils.java:51)
    at edu.stanford.nlp.pipeline.ParserAnnotator.finishSentence(ParserAnnotator.java:266)
    at edu.stanford.nlp.pipeline.ParserAnnotator.doOneSentence(ParserAnnotator.java:245)
    at edu.stanford.nlp.pipeline.SentenceAnnotator.annotate(SentenceAnnotator.java:96)
    at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:68)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:412)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.process(StanfordCoreNLP.java:441)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.shell(StanfordCoreNLP.java:678)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1016)
    at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1062)
AngledLuffa commented 9 years ago

I did some investigating out of fear that my optimization of Semgrex had done something here. I got that after expandPrepConjunctions there is an edge with a (null) relation:

known/VBN -> People/NNS (nsubjpass) known/VBN -> are/VBP (auxpass) known/VBN -> known/VBN' (conj:or) known/VBN -> places/NNS (xcomp) known/VBN' -> places/NNS (null) at/IN -> or/CC (cc) at/IN -> near/IN (conj) places/NNS -> to/TO (mark) places/NNS -> be/VB (cop) places/NNS -> at/IN (case)

The problem is coming from getCaseMarkedRelation, which returns (null) for this example sentence

On Sun, Apr 26, 2015 at 6:13 AM, Paul English notifications@github.com wrote:

I'm getting a NullPointerException when trying to parse "People are known to be at or near places" through the shell.

stanford-corenlp-full-2015-04-20> ./corenlp.sh -threads 4 -annotators tokenize,ssplit,pos,lemma,ner,parse java -mx3g -cp "./*" edu.stanford.nlp.pipeline.StanfordCoreNLP -threads 4 -annotators tokenize,ssplit,pos,lemma,ner,parse Adding annotator tokenize TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer. Adding annotator ssplit Adding annotator pos Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.6 sec]. Adding annotator lemma Adding annotator ner Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [4.7 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [2.9 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [2.3 sec]. Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1. Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt Adding annotator parse Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.7 sec].

Entering interactive shell. Type q RETURN or EOF to quit. NLP> People are known to be at or near places Exception in thread "main" java.lang.NullPointerException at edu.stanford.nlp.semgraph.semgrex.GraphRelation$GOVERNER$1.advance(GraphRelation.java:271) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.initialize(GraphRelation.java:929) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.(GraphRelation.java:910) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$GOVERNER$1.(GraphRelation.java:257) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$GOVERNER.searchNodeIterator(GraphRelation.java:257) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChildIter(NodePattern.java:273) at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.resetChildIter(SemgrexMatcher.java:76) at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.resetChildIter(CoordinationPattern.java:170) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChild(NodePattern.java:297) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.goToNextNodeMatch(NodePattern.java:394) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.matches(NodePattern.java:511) at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.find(SemgrexMatcher.java:155) at edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructure.addCaseMarkerInformation(UniversalEnglishGrammaticalStructure.java:225) at edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructure.collapseDependencies(UniversalEnglishGrammaticalStructure.java:817) at edu.stanford.nlp.trees.GrammaticalStructure.typedDependenciesCollapsed(GrammaticalStructure.java:877) at edu.stanford.nlp.semgraph.SemanticGraphFactory.makeFromTree(SemanticGraphFactory.java:188) at edu.stanford.nlp.semgraph.SemanticGraphFactory.generateCollapsedDependencies(SemanticGraphFactory.java:90) at edu.stanford.nlp.pipeline.ParserAnnotatorUtils.fillInParseAnnotations(ParserAnnotatorUtils.java:51) at edu.stanford.nlp.pipeline.ParserAnnotator.finishSentence(ParserAnnotator.java:266) at edu.stanford.nlp.pipeline.ParserAnnotator.doOneSentence(ParserAnnotator.java:245) at edu.stanford.nlp.pipeline.SentenceAnnotator.annotate(SentenceAnnotator.java:96) at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:68) at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:412) at edu.stanford.nlp.pipeline.StanfordCoreNLP.process(StanfordCoreNLP.java:441) at edu.stanford.nlp.pipeline.StanfordCoreNLP.shell(StanfordCoreNLP.java:678) at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1016) at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1062)

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/68.

AngledLuffa commented 9 years ago

I didn't want to make that change out of fear of it not being "correct", but I take it that is the right answer?

On Mon, Apr 27, 2015 at 5:48 PM, Gabor Angeli notifications@github.com wrote:

Closed #68 https://github.com/stanfordnlp/CoreNLP/issues/68 via 60cc8f7 https://github.com/stanfordnlp/CoreNLP/commit/60cc8f78f6472a98da41c242ed41f7af6563418d .

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/68#event-291474952.