Open arademaker opened 7 years ago
Explaning , below all cases that SyntaxNet tagged as PUNCT the Freeling tagged as Fc
(comma), Fh
(hyphen) etc.
855 PUNCT Fc
2 PUNCT Fh
19680 PUNCT Fp
19 PUNCT NN
Sadly, those numbers are nothing without the examples, but I don't see a easy way to show the examples.
We can also try to compare the XPOSTAGS
fields to the Freeling POS tag, since it appears that they match tagsets... and maybe only list the differences?
@arademaker but the numbers are good, they show there are only 19 cases where SyntaxNet messed it up, no?
don't you want to call the issue mismatch? mismake doesn't make much sense to me...
@vcvpaiva for PUNCT cases , yes. But PUNCT should be easy don't you think ?
yes, PUNCT should be easy, and it's proving not so much. but if Freeling gets all these right, then we build on the two of them together. they say 90% accuracy in pos-tagging, and it's SOTA, right? meaning 2000 sentences wrong, in 20K.
I don't agree with that claim that they are SOTA. We will have to check the numbers. 2K sentences wrong ? But we have repetitions and the POS tag errors can really make much more than 2K sentences wrong. Basically we can try to reproduze the analysis below in this corpus:
C. D. Manning, "Part-of-Speech Tagging from 97% to 100%: is time for some linguistics?," nlp.stanford.edu. [Online]. Available: http://nlp.stanford.edu/pubs/CICLing2011-manning-tagging.pdf. [Accessed: 28-Apr-2015].
Sadly, those numbers are nothing without the examples, but I don't see a easy way to show the examples.
well, for the examples for cases of punctuation we know all. they are the 19 cases, 12 "whisk" and 7 "ear" as punctuation marks.
but yes, my bad. @fcbr said 90% accuracy on pos-tagging for this model, but this is token accuracy, which I misread as sentence accuracy.
I can cope with the lack of accuracy of SyntaxNet, but have to confess that the SUMO mappings, which were my main goal, are looking too bad now, see #10.
@vcvpaiva
I can cope with the lack of accuracy of SyntaxNet, but have to confess that the SUMO mappings, which were my main goal, are looking too bad now
Let us not blame the mappings but the WSD process. But we can think about how to improve it.
@arademaker Can you re-run the script above to find the discrepancies between Parsey and FreeLing? the same kind of table that we have above, but calculated over the 6076 "normalized" sentences is what I am asking for.