mismatch POS tagging FreeLing vs. SyntaxNet

arademaker commented 7 years ago

$ for f in *.conll; do awk '$0 !~ /(^#)|(^$)/ {n=gensub(/\|.*$/,"","g",$10); print $4,n}' $f; done | sort | uniq -c
 210 ADJ DT
  35 ADJ IN
8016 ADJ JJ
  16 ADJ JJR
1192 ADJ NN
  39 ADJ NNS
   2 ADJ PRP
 128 ADJ RB
   2 ADJ RP
  10 ADJ VB
  15 ADJ VBD
 407 ADJ VBG
 447 ADJ VBN
  11 ADJ VBP
  52 ADJ Z
   2 ADP CC
19303 ADP IN
  10 ADP JJ
  49 ADP NN
  12 ADP NNS
  13 ADP RB
1238 ADP RP
 561 ADP TO
   7 ADP VBG
   2 ADP VBN
  42 ADP Z
   9 ADV DT
 176 ADV IN
 157 ADV JJ
 146 ADV NN
   2 ADV NNS
1183 ADV RB
 262 ADV RP
   4 ADV VB
   2 ADV VBP
   2 ADV WRB
  20 AUX JJ
   9 AUX MD
   4 AUX NN
   2 AUX NNS
   4 AUX VB
 972 AUX VBG
   6 AUX VBN
3497 AUX VBP
14695 AUX VBZ
   2 AUX Z
4590 CONJ CC
40962 DET DT
   4 DET IN
   9 DET JJ
   5 DET NN
   3 DET RB
   4 DET WDT
  34 DET Z
  91 NOUN IN
2633 NOUN JJ
   7 NOUN JJR
  27 NOUN MD
45729 NOUN NN
7601 NOUN NNS
 375 NOUN PRP
 188 NOUN RB
  24 NOUN RP
  44 NOUN VB
  12 NOUN VBD
 802 NOUN VBG
 206 NOUN VBN
  14 NOUN VBP
  61 NOUN VBZ
 203 NOUN Z
   2 NUM NN
  28 NUM PRP
1910 NUM Z
 107 PART POS
 891 PART RB
  76 PART TO
   4 PART Z
1171 PRON EX
   2 PRON IN
  16 PRON JJ
  51 PRON NN
   2 PRON NNS
 229 PRON PRP
 956 PRON PRP$
   7 PRON RB
 621 PRON WDT
 262 PRON WP
 113 PRON Z
   8 PROPN DT
  21 PROPN IN
  22 PROPN JJ
  84 PROPN NN
  56 PROPN NNS
 116 PROPN PRP
  11 PROPN RB
   4 PROPN Z
 855 PUNCT Fc
   2 PUNCT Fh
19680 PUNCT Fp
  19 PUNCT NN
  62 SCONJ IN
   2 SCONJ JJ
   5 SCONJ NN
   9 VERB IN
 338 VERB JJ
 233 VERB NN
  39 VERB NNS
   2 VERB PRP
   6 VERB RB
  11 VERB RP
  90 VERB VB
  65 VERB VBD
20092 VERB VBG
1734 VERB VBN
 342 VERB VBP
3545 VERB VBZ
  71 VERB Z
   2 X VBG
   2 X Z

arademaker commented 7 years ago

Explaning , below all cases that SyntaxNet tagged as PUNCT the Freeling tagged as Fc (comma), Fh (hyphen) etc.

 855 PUNCT Fc
   2 PUNCT Fh
19680 PUNCT Fp
  19 PUNCT NN

Sadly, those numbers are nothing without the examples, but I don't see a easy way to show the examples.

fcbr commented 7 years ago

We can also try to compare the XPOSTAGS fields to the Freeling POS tag, since it appears that they match tagsets... and maybe only list the differences?

vcvpaiva commented 7 years ago

@arademaker but the numbers are good, they show there are only 19 cases where SyntaxNet messed it up, no?

don't you want to call the issue mismatch? mismake doesn't make much sense to me...

arademaker commented 7 years ago

@vcvpaiva for PUNCT cases , yes. But PUNCT should be easy don't you think ?

vcvpaiva commented 7 years ago

yes, PUNCT should be easy, and it's proving not so much. but if Freeling gets all these right, then we build on the two of them together. they say 90% accuracy in pos-tagging, and it's SOTA, right? meaning 2000 sentences wrong, in 20K.

arademaker commented 7 years ago

I don't agree with that claim that they are SOTA. We will have to check the numbers. 2K sentences wrong ? But we have repetitions and the POS tag errors can really make much more than 2K sentences wrong. Basically we can try to reproduze the analysis below in this corpus:

C. D. Manning, "Part-of-Speech Tagging from 97% to 100%: is time for some linguistics?," nlp.stanford.edu. [Online]. Available: http://nlp.stanford.edu/pubs/CICLing2011-manning-tagging.pdf. [Accessed: 28-Apr-2015].

vcvpaiva commented 7 years ago

Sadly, those numbers are nothing without the examples, but I don't see a easy way to show the examples.

well, for the examples for cases of punctuation we know all. they are the 19 cases, 12 "whisk" and 7 "ear" as punctuation marks.

but yes, my bad. @fcbr said 90% accuracy on pos-tagging for this model, but this is token accuracy, which I misread as sentence accuracy.

I can cope with the lack of accuracy of SyntaxNet, but have to confess that the SUMO mappings, which were my main goal, are looking too bad now, see #10.

arademaker commented 7 years ago

@vcvpaiva

I can cope with the lack of accuracy of SyntaxNet, but have to confess that the SUMO mappings, which were my main goal, are looking too bad now

Let us not blame the mappings but the WSD process. But we can think about how to improve it.

vcvpaiva commented 7 years ago

@arademaker Can you re-run the script above to find the discrepancies between Parsey and FreeLing? the same kind of table that we have above, but calculated over the 6076 "normalized" sentences is what I am asking for.

own-pt / rte-sick

mismatch POS tagging FreeLing vs. SyntaxNet #6