stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.71k stars 2.7k forks source link

Annotation errors for extraordinary names #178

Open Tooa opened 8 years ago

Tooa commented 8 years ago

Hi :wave: ,

please take a look at the examples provided below. The examples were processed with corenlp.run.

The POS-tagger and the Coref-annotator (mention detection) often fail for extraordinary names. Especially, the POS-tagger often fails in recognizing names as nouns - even if the sentence starts with a name.

I wonder, if this is related to the universal postags. It's maybe harder to assign the correct label for a smaller tag set. I haven't seen such pos-tagging errors with more fine grained tag sets in the past.

Sue was nervous about taking the driver's test. She likes apples.

Sue is classified as VB and not recognized as mention. Therefore Sue and She are not mapped to the same coreference chain.

Lira was so excited to meet her favorite rapper. She had backstage passes for after the concert.

Lira is correctly classified as NN, but not identified as mention.

Coy needed new sneakers. She went to the store and examined their selection.

Coy is classified as JJ and therefore also not identified as mention.

Sunny went with her family to a village. She likes apples.

Sunny is classified as JJ again and not identified as mention.

J38 commented 8 years ago

Thank you for identifying these error cases, we will dig into them further. I am actually working on a new POS tagger and NER tagger for the toolkit, so hopefully issues like these will be resolved!

Tooa commented 8 years ago

Are you planning to release the new tagger soon? Here's BTW another challenging sentence:

She sent him a present.

The current POS-tagger classifies the token present as an adjective, while it's actually a noun in this context. Maybe your new implementation is also able to resolve this instance.