stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.69k stars 2.7k forks source link

NullRef in DeterministicCorefSieve.sortMentionsForPronoun #31

Closed Necrolis closed 10 years ago

Necrolis commented 10 years ago

When using the caseless pos-tagger, it is possible to trigger a null-reference exception in edu.stanford.nlp.dcoref.sievepasses.DeterministicCorefSieve.sortMentionsForPronoun when there is a dangling pronoun.

A simple repro-case using a simplified tweet that can trigger the exception: rt @bob: I really hate fifa 2015. ya

which yields this trace:

Exception in thread "main" java.lang.RuntimeException: Error annotating C:\Users\***\Desktop\stanford-corenlp-full-2014-06-16\input.txt
        at edu.stanford.nlp.pipeline.StanfordCoreNLP$15.run(StanfordCoreNLP.java:1288)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1348)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1390)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1460)
Caused by: java.lang.NullPointerException
        at edu.stanford.nlp.dcoref.sievepasses.DeterministicCorefSieve.sortMentionsForPronoun(DeterministicCorefSieve.java:482)
        at edu.stanford.nlp.dcoref.sievepasses.DeterministicCorefSieve.getOrderedAntecedents(DeterministicCorefSieve.java:464)
        at edu.stanford.nlp.dcoref.SieveCoreferenceSystem.coreference(SieveCoreferenceSystem.java:898)
        at edu.stanford.nlp.dcoref.SieveCoreferenceSystem.coref(SieveCoreferenceSystem.java:845)
        at edu.stanford.nlp.pipeline.DeterministicCorefAnnotator.annotate(DeterministicCorefAnnotator.java:121)
        at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:67)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:848)
        at edu.stanford.nlp.pipeline.StanfordCoreNLP$15.run(StanfordCoreNLP.java:1276)
        ... 3 more

Admittedly this is not correct English in anyway, however it would be nice to see a little more robustness in the system :)

AngledLuffa commented 10 years ago

As much as I love a good bug report (by which I mean one that is as well documented as yours) I am unable to reproduce the bug:

foo.txt:

rt @bob: I really hate fifa 2015. ya

java edu.stanford.nlp.pipeline.StanfordCoreNLP -file foo.txt -pos.model /u/nlp/data/pos-tagger/distrib/english-caseless-left3words-distsim.tagger

On Tue, Jul 22, 2014 at 12:05 PM, Necrolis notifications@github.com wrote:

When using the caseless pos-tagger, it is possible to trigger a null-reference exception in edu.stanford.nlp.dcoref.sievepasses.DeterministicCorefSieve.sortMentionsForPronoun when there is a dangling pronoun.

A simple repro-case using a simplified tweet that can trigger the exception: rt @bob: I really hate fifa 2015. ya

which yields this trace:

Exception in thread "main" java.lang.RuntimeException: Error annotating C:\Users***\Desktop\stanford-corenlp-full-2014-06-16\input.txt at edu.stanford.nlp.pipeline.StanfordCoreNLP$15.run(StanfordCoreNLP.java:1288) at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1348) at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1390) at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1460) Caused by: java.lang.NullPointerException at edu.stanford.nlp.dcoref.sievepasses.DeterministicCorefSieve.sortMentionsForPronoun(DeterministicCorefSieve.java:482) at edu.stanford.nlp.dcoref.sievepasses.DeterministicCorefSieve.getOrderedAntecedents(DeterministicCorefSieve.java:464) at edu.stanford.nlp.dcoref.SieveCoreferenceSystem.coreference(SieveCoreferenceSystem.java:898) at edu.stanford.nlp.dcoref.SieveCoreferenceSystem.coref(SieveCoreferenceSystem.java:845) at edu.stanford.nlp.pipeline.DeterministicCorefAnnotator.annotate(DeterministicCorefAnnotator.java:121) at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:67) at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:848) at edu.stanford.nlp.pipeline.StanfordCoreNLP$15.run(StanfordCoreNLP.java:1276) ... 3 more

Admittedly this is not correct English in anyway, however it would be nice to see a little more robustness in the system :)

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/31.

Necrolis commented 10 years ago

Seems I forgot a vital bit, which I hadn't realized till now (as I didn't test with other parse models); this error only occurs when the SR parser is used (via parse.model=edu/stanford/nlp/models/srparser/englishSR.ser.gz).

AngledLuffa commented 10 years ago

Thanks for pointing this out. There was a weird case where the parser produced a tree that didn't have ROOT on top, followed by a usage of the parse trees in coref where the ending condition was ROOT or null, but it checked at the end of the loop instead of the start (presumably thinking no tree would ever fail to have ROOT at top). I have checked in a fix for the second issue and will try to fix the first issue as well.

John

On Tue, Jul 22, 2014 at 2:53 PM, Necrolis notifications@github.com wrote:

Seems I forgot the other vital bit, which I hadn't realized till now; this error only occurs when the SR parser is used (via parse.model=edu/stanford/nlp/models/srparser/englishSR.ser.gz).

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/31#issuecomment-49806646.

Necrolis commented 10 years ago

Awesome! I've managed to collect quite a few examples that trigger this condition (some of which I think constitute correct grammatical forms), so just give me the heads up when I can test the changes out.

On a sort of side note: I hadn't noticed that the SR models received an update (your site doesn't make this too clear, I only released there was an update from the mailing list); comparing the June 16th models vs the July 1st models, I noticed the latter avoids crashing on a few cases we found. The example proposed before however still produces the crash.

AngledLuffa commented 10 years ago

If I remember correctly, you had pointed out some other bug which I fixed by putting out a new set of models and calling them the 3.4.0 models without actually changing the version number.

You should be able to test the changes from github already.

On Fri, Jul 25, 2014 at 2:48 PM, Necrolis notifications@github.com wrote:

Awesome! I've managed to collect quite a few examples that trigger this condition (some of which I think constitute correct grammatical forms), so just give me the heads up when I can test the changes out.

On a sort of side note: I hadn't noticed that the SR models received an update (your site doesn't make this too clear, I only released there was an update from the mailing list); comparing the June 16th models vs the July 1st models, I noticed the latter avoids crashing on a few cases we found. The example proposed before however still produces the crash.

— Reply to this email directly or view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/31#issuecomment-50208680.

Necrolis commented 10 years ago

Did some testing with the new changes, everything seems to be running smoothly :)

Thanks for the quick fix!