Open J38 opened 6 years ago
More issues with pos tagging:
Any updates on this issues? Were these solved in the StanfordNLP?
I noticed that coreference resolution fails on 'Victoria lives in Vancouver. She likes apples.' >> corefs: {}
Sent index WORD LEMMA POS (Regex)NER
0 1 Victoria Victoria NNP CITY
0 2 lives live VBZ O
0 3 in in IN O
0 4 Vancouver Vancouver NNP CITY
0 5 . . . O
1 1 She she PRP O
1 2 likes like VBZ O
1 3 apples apple NNS O
1 4 . . . O
Coreference resolution succeeds works on 'Victoria lived in Vancouver. She likes apples.' >> corefs: "She" --> "Victoria"
Sent index WORD LEMMA POS (Regex)NER
0 1 Victoria Victoria NNP PERSON
0 2 lived live VBD O
0 3 in in IN O
0 4 Vancouver Vancouver NNP CITY
0 5 . . . O
1 1 She she PRP O
1 2 likes like VBZ O
1 3 apples apple NNS O
1 4 . . . O
Live
can be a verb (lives in
| lived in
), an adjective (live performance
), or a noun (our lives matter
). Hence I initially thought that misidentifying verbs (or even verb tense) was the issue leading to the coreference resolution issue (above) -- but looking at the tags (above), I realized it was in fact a NER issue.
Here I fine-tune the NER via RegexNER, enabling more robust coref (in each case, "She" -> "Victoria").
Sent index WORD LEMMA POS (Regex)NER
0 1 Victoria Victoria NNP PERSON
0 2 lives live VBZ O
0 3 in in IN O
0 4 Vancouver Vancouver NNP CITY
0 5 . . . O
1 1 She she PRP O
1 2 likes like VBZ O
1 3 apples apple NNS FRUIT
1 4 . . . O
Sent index WORD LEMMA POS (Regex)NER
0 1 Victoria Victoria NNP PERSON
0 2 lived live VBD O
0 3 in in IN O
0 4 Vancouver Vancouver NNP CITY
0 5 . . . O
1 1 She she PRP O
1 2 likes like VBZ O
1 3 apples apple NNS FRUIT
1 4 . . . O
Certainly this is not ideal, but I can also kind of see why it would happen. The city of Victoria is on Vancouver Island, and the location Victoria shows up in the training data quite often, so mentioning Victoria and Vancouver in a sentence is going to look very strongly like a city.
On Tue, Feb 25, 2020 at 7:45 PM Victoria Stuart notifications@github.com wrote:
I noticed that coreference resolution fails on 'Victoria lives in Vancouver. She likes apples.' >> corefs: {}
Sent index WORD LEMMA POS (Regex)NER 0 1 Victoria Victoria NNP CITY 0 2 lives live VBZ O 0 3 in in IN O 0 4 Vancouver Vancouver NNP CITY 0 5 . . . O 1 1 She she PRP O 1 2 likes like VBZ O 1 3 apples apple NNS O 1 4 . . . O
Coreference resolution succeeds works on 'Victoria lived in Vancouver. She likes apples.' >> corefs: "She" --> "Victoria"
Sent index WORD LEMMA POS (Regex)NER 0 1 Victoria Victoria NNP PERSON 0 2 lived live VBD O 0 3 in in IN O 0 4 Vancouver Vancouver NNP CITY 0 5 . . . O 1 1 She she PRP O 1 2 likes like VBZ O 1 3 apples apple NNS O 1 4 . . . O
Live can be a verb (lives in | lived in), an adjective (live performance), or a noun (our lives matter). Hence I initially thought that misidentifying verbs (or even verb tense) was the issue leading to the coreference resolution issue (above) -- but looking at the tags (above), I realized it was in fact a NER issue.
Here I fine-tune the NER via RegexNER, enabling more robust coref (in each case, "She" -> "Victoria").
Sent index WORD LEMMA POS (Regex)NER 0 1 Victoria Victoria NNP PERSON 0 2 lives live VBZ O 0 3 in in IN O 0 4 Vancouver Vancouver NNP CITY 0 5 . . . O 1 1 She she PRP O 1 2 likes like VBZ O 1 3 apples apple NNS FRUIT 1 4 . . . O
Sent index WORD LEMMA POS (Regex)NER 0 1 Victoria Victoria NNP PERSON 0 2 lived live VBD O 0 3 in in IN O 0 4 Vancouver Vancouver NNP CITY 0 5 . . . O 1 1 She she PRP O 1 2 likes like VBZ O 1 3 apples apple NNS FRUIT 1 4 . . . O
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/680?email_source=notifications&email_token=AA2AYWO7CRBOZ2Q66YPEJ3LREXQUVA5CNFSM4E3SSKUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM6VYFQ#issuecomment-591223830, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWOLD6MZGD5LXVBZL2LREXQUVANCNFSM4E3SSKUA .
"antennae" labeled as "NN" instead of "NNS" in the sentence "Jennifer has the prettiest antennae"
The statistical model for part of speech tagging is not perfect due to both the limitations of the algorithm and training data. We should use this thread to catalog errors that users identify.