stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.72k stars 2.7k forks source link

Non-uniform tokenization of sentences having dialogue #223

Open NikhilPr95 opened 8 years ago

NikhilPr95 commented 8 years ago

A sentence which has quoted as well as non-quoted words in it is not parsed uniformly.

Given sentences such as- "Where were you?" asked Mary angrily.

It will parse roughly half the sentences as one sentence -

  1. "Where were you?" asked Mary angrily.

and the other half as -

  1. "Where were you?"
  2. asked Mary angrily.

This occurs when the following code is executed (in the most recent version)-

             Properties props = new Properties();
             props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, depparse");

             pipeline = new StanfordCoreNLP(props);

         Annotation document = new Annotation(doc);
             pipeline.annotate(document);

             List<CoreMap> sentences = document.get(SentencesAnnotation.class);
ngrdtcs commented 7 years ago

Possible solution is found and it is working fine. Upon submitting the fix, I end up with below auto test failure. Please suggest what does it imply.

testMultiParagraphQuoteSingle(edu.stanford.nlp.pipeline.QuoteAnnotatorTest) Time elapsed: 0.008 sec <<< FAILURE!

junit.framework.AssertionFailedError: expected:<1> but was:<2>

at junit.framework.Assert.fail(Assert.java:57)

at junit.framework.Assert.failNotEquals(Assert.java:329)

at junit.framework.Assert.assertEquals(Assert.java:78)

at junit.framework.Assert.assertEquals(Assert.java:234)

at junit.framework.Assert.assertEquals(Assert.java:241)

at junit.framework.TestCase.assertEquals(TestCase.java:409)

at edu.stanford.nlp.pipeline.QuoteAnnotatorTest.assertInnerAnnotationValues(QuoteAnnotatorTest.java:471)

at edu.stanford.nlp.pipeline.QuoteAnnotatorTest.testMultiParagraphQuoteSingle(QuoteAnnotatorTest.java:383)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)