Open akolonin opened 5 years ago
PR #241 has fixed the issue. PA and PQ results with MWC > 1 were affected on some corpora. The old results obtained parsing GC Gold corpus are:
Corpus PA/MWC1 PA/MWC2 PA/MWC3 PA/MWC4 PA/MWC5 F1/MWC1 F1/MWC2 F1/MWC3 F1/MWC4 F1/MWC5
abs 87.53% 100.00% 0.00% 0.00% 0.00% 0.5759 0.8542 0.0000 0.0000 0.0000
any 86.30% 100.00% 0.00% 0.00% 0.00% 0.5742 0.8889 0.0000 0.0000 0.0000
lge 80.84% 100.00% 0.00% 0.00% 0.00% 0.7771 0.9432 0.0000 0.0000 0.0000
rnd 88.44% 100.00% 0.00% 0.00% 0.00% 0.5582 0.7295 0.0000 0.0000 0.0000
seq 85.01% 100.00% 0.00% 0.00% 0.00% 0.6060 0.8542 0.0000 0.0000 0.0000
w10 85.75% 100.00% 0.00% 0.00% 0.00% 0.5213 0.6222 0.0000 0.0000 0.0000
w6r 85.89% 100.00% 0.00% 0.00% 0.00% 0.5597 0.8056 0.0000 0.0000 0.0000
while the new results are:
Corpus PA/MWC1 PA/MWC2 PA/MWC3 PA/MWC4 PA/MWC5 F1/MWC1 F1/MWC2 F1/MWC3 F1/MWC4 F1/MWC5
abs 87.53% 100.00% 0.00% 0.00% 0.00% 0.5759 0.8542 0.0000 0.0000 0.0000
any 86.30% 96.43% 0.00% 0.00% 0.00% 0.5742 0.8497 0.0000 0.0000 0.0000
lge 80.84% 100.00% 0.00% 0.00% 0.00% 0.7771 0.9432 0.0000 0.0000 0.0000
rnd 88.44% 93.75% 0.00% 0.00% 0.00% 0.5582 0.6474 0.0000 0.0000 0.0000
seq 85.01% 100.00% 0.00% 0.00% 0.00% 0.6060 0.8542 0.0000 0.0000 0.0000
w10 85.75% 88.10% 0.00% 0.00% 0.00% 0.5213 0.6231 0.0000 0.0000 0.0000
w6r 85.89% 96.43% 0.00% 0.00% 0.00% 0.5597 0.7871 0.0000 0.0000 0.0000
Previous data for GC Silver corpus:
Corpus PA/MWC1 PA/MWC2 PA/MWC3 PA/MWC4 PA/MWC5 F1/MWC1 F1/MWC2 F1/MWC3 F1/MWC4 F1/MWC5
abs 85.35% 100.00% 100.00% 100.00% 100.00% 0.5183 0.6078 0.6112 0.6135 0.6164
any 82.34% 100.00% 100.00% 100.00% 100.00% 0.4879 0.5914 0.6002 0.6049 0.6092
lge 75.22% 100.00% 100.00% 100.00% 100.00% 0.7099 0.9421 0.9423 0.9409 0.9416
rnd 85.92% 100.00% 100.00% 100.00% 100.00% 0.4958 0.5828 0.5861 0.5873 0.5901
seq 82.97% 100.00% 100.00% 100.00% 100.00% 0.5535 0.6657 0.6689 0.6707 0.6726
w10 80.47% 100.00% 100.00% 100.00% 100.00% 0.4285 0.5258 0.5375 0.5435 0.5500
w6r 82.50% 100.00% 100.00% 100.00% 100.00% 0.4908 0.5989 0.6075 0.6122 0.6179
New data for GC Silver corpus:
Corpus PA/MWC1 PA/MWC2 PA/MWC3 PA/MWC4 PA/MWC5 F1/MWC1 F1/MWC2 F1/MWC3 F1/MWC4 F1/MWC5
abs 85.35% 94.30% 94.81% 95.34% 95.41% 0.5183 0.5719 0.5779 0.5829 0.5866
any 82.34% 93.02% 93.57% 94.26% 94.18% 0.4879 0.5516 0.5623 0.5706 0.5739
lge 75.22% 88.33% 88.97% 89.83% 89.78% 0.7099 0.8376 0.8434 0.8512 0.8516
rnd 85.92% 93.67% 94.19% 94.64% 94.63% 0.4958 0.5412 0.5480 0.5524 0.5547
seq 82.97% 93.71% 94.40% 95.06% 95.16% 0.5535 0.6227 0.6302 0.6362 0.6385
w10 80.47% 91.66% 92.38% 92.86% 93.06% 0.4285 0.4872 0.5010 0.5093 0.5160
w6r 82.50% 93.33% 93.61% 94.44% 94.39% 0.4908 0.5588 0.5687 0.5776 0.5827
Old data for GC FULL corpus:
Corpus PA/MWC1 PA/MWC2 F1/MWC1 F1/MWC2
abs 84.15% 100.00% 0.4566 0.5585
any 80.17% 100.00% 0.4236 0.5438
lge 52.60% 100.00% 0.4939 0.9538
rnd 84.71% 100.00% 0.4361 0.5343
seq 80.24% 100.00% 0.4839 0.6148
w10 77.15% 100.00% 0.3699 0.4811
w6r 79.72% 100.00% 0.4264 0.5492
New data for GC FULL corpus:
Corpus PA/MWC1 PA/MWC2 F1/MWC1 F1/MWC2
abs 84.15% 90.02% 0.4566 0.4912
any 80.17% 86.07% 0.4236 0.4593
lge 52.60% 59.43% 0.4939 0.5613
rnd 84.71% 89.67% 0.4361 0.4645
seq 80.24% 87.39% 0.4839 0.5251
w10 77.15% 84.73% 0.3699 0.4085
w6r 79.72% 86.54% 0.4264 0.4672
Wrong MWC estimation, where square brackets left intact, heavily reduced corpora leaving only fully parsed sentences. That led to different number of sentences being processed with different grammars and anomaly high PA and F1 values. Fixed code processes the same number of sentences with different grammars with the same MWC settings. The values fall a bit lower because the number of sentences being processed raised and corpora now has sentences with unparsed words too.