stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.72k stars 2.7k forks source link

POS Tagging Errors #680

Open J38 opened 6 years ago

J38 commented 6 years ago

The statistical model for part of speech tagging is not perfect due to both the limitations of the algorithm and training data. We should use this thread to catalog errors that users identify.

J38 commented 6 years ago

Some issues brought up by @jeffrschneider:

https://github.com/stanfordnlp/CoreNLP/issues/578 https://github.com/stanfordnlp/CoreNLP/issues/580 https://github.com/stanfordnlp/CoreNLP/issues/656 https://github.com/stanfordnlp/CoreNLP/issues/675

J38 commented 6 years ago

More issues with the pos tagging:

https://github.com/stanfordnlp/CoreNLP/issues/576 https://github.com/stanfordnlp/CoreNLP/issues/597 https://github.com/stanfordnlp/CoreNLP/issues/610

J38 commented 6 years ago

More issues with pos tagging:

https://github.com/stanfordnlp/CoreNLP/issues/575

ndvbd commented 5 years ago

Any updates on this issues? Were these solved in the StanfordNLP?

victoriastuart commented 4 years ago

I noticed that coreference resolution fails on 'Victoria lives in Vancouver. She likes apples.' >> corefs: {}

Sent  index WORD             LEMMA            POS              (Regex)NER     
   0  1     Victoria         Victoria         NNP              CITY           
   0  2     lives            live             VBZ              O              
   0  3     in               in               IN               O              
   0  4     Vancouver        Vancouver        NNP              CITY           
   0  5     .                .                .                O              
   1  1     She              she              PRP              O              
   1  2     likes            like             VBZ              O              
   1  3     apples           apple            NNS              O              
   1  4     .                .                .                O              

Coreference resolution succeeds works on 'Victoria lived in Vancouver. She likes apples.' >> corefs: "She" --> "Victoria"

Sent  index WORD             LEMMA            POS              (Regex)NER     
   0  1     Victoria         Victoria         NNP              PERSON         
   0  2     lived            live             VBD              O              
   0  3     in               in               IN               O              
   0  4     Vancouver        Vancouver        NNP              CITY           
   0  5     .                .                .                O              
   1  1     She              she              PRP              O              
   1  2     likes            like             VBZ              O              
   1  3     apples           apple            NNS              O              
   1  4     .                .                .                O              

Live can be a verb (lives in | lived in), an adjective (live performance), or a noun (our lives matter). Hence I initially thought that misidentifying verbs (or even verb tense) was the issue leading to the coreference resolution issue (above) -- but looking at the tags (above), I realized it was in fact a NER issue.

Here I fine-tune the NER via RegexNER, enabling more robust coref (in each case, "She" -> "Victoria").

Sent  index WORD             LEMMA            POS              (Regex)NER     
   0  1     Victoria         Victoria         NNP              PERSON         
   0  2     lives            live             VBZ              O              
   0  3     in               in               IN               O              
   0  4     Vancouver        Vancouver        NNP              CITY           
   0  5     .                .                .                O              
   1  1     She              she              PRP              O              
   1  2     likes            like             VBZ              O              
   1  3     apples           apple            NNS              FRUIT          
   1  4     .                .                .                O              

Sent  index WORD             LEMMA            POS              (Regex)NER     
   0  1     Victoria         Victoria         NNP              PERSON         
   0  2     lived            live             VBD              O              
   0  3     in               in               IN               O              
   0  4     Vancouver        Vancouver        NNP              CITY           
   0  5     .                .                .                O              
   1  1     She              she              PRP              O              
   1  2     likes            like             VBZ              O              
   1  3     apples           apple            NNS              FRUIT          
   1  4     .                .                .                O              
AngledLuffa commented 4 years ago

Certainly this is not ideal, but I can also kind of see why it would happen. The city of Victoria is on Vancouver Island, and the location Victoria shows up in the training data quite often, so mentioning Victoria and Vancouver in a sentence is going to look very strongly like a city.

On Tue, Feb 25, 2020 at 7:45 PM Victoria Stuart notifications@github.com wrote:

I noticed that coreference resolution fails on 'Victoria lives in Vancouver. She likes apples.' >> corefs: {}

Sent index WORD LEMMA POS (Regex)NER 0 1 Victoria Victoria NNP CITY 0 2 lives live VBZ O 0 3 in in IN O 0 4 Vancouver Vancouver NNP CITY 0 5 . . . O 1 1 She she PRP O 1 2 likes like VBZ O 1 3 apples apple NNS O 1 4 . . . O

Coreference resolution succeeds works on 'Victoria lived in Vancouver. She likes apples.' >> corefs: "She" --> "Victoria"

Sent index WORD LEMMA POS (Regex)NER 0 1 Victoria Victoria NNP PERSON 0 2 lived live VBD O 0 3 in in IN O 0 4 Vancouver Vancouver NNP CITY 0 5 . . . O 1 1 She she PRP O 1 2 likes like VBZ O 1 3 apples apple NNS O 1 4 . . . O

Live can be a verb (lives in | lived in), an adjective (live performance), or a noun (our lives matter). Hence I initially thought that misidentifying verbs (or even verb tense) was the issue leading to the coreference resolution issue (above) -- but looking at the tags (above), I realized it was in fact a NER issue.

Here I fine-tune the NER via RegexNER, enabling more robust coref (in each case, "She" -> "Victoria").

Sent index WORD LEMMA POS (Regex)NER 0 1 Victoria Victoria NNP PERSON 0 2 lives live VBZ O 0 3 in in IN O 0 4 Vancouver Vancouver NNP CITY 0 5 . . . O 1 1 She she PRP O 1 2 likes like VBZ O 1 3 apples apple NNS FRUIT 1 4 . . . O

Sent index WORD LEMMA POS (Regex)NER 0 1 Victoria Victoria NNP PERSON 0 2 lived live VBD O 0 3 in in IN O 0 4 Vancouver Vancouver NNP CITY 0 5 . . . O 1 1 She she PRP O 1 2 likes like VBZ O 1 3 apples apple NNS FRUIT 1 4 . . . O

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/680?email_source=notifications&email_token=AA2AYWO7CRBOZ2Q66YPEJ3LREXQUVA5CNFSM4E3SSKUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM6VYFQ#issuecomment-591223830, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWOLD6MZGD5LXVBZL2LREXQUVANCNFSM4E3SSKUA .

AngledLuffa commented 2 years ago

"antennae" labeled as "NN" instead of "NNS" in the sentence "Jennifer has the prettiest antennae"