double-check BLEU computation

ondrejklejch / MT-ComparEval

Tool for comparison and evaluation of machine translation.

Apache License 2.0

56 stars 14 forks source link

double-check BLEU computation #5

Closed martinpopel closed 8 years ago

martinpopel commented 9 years ago

Make sure the BLEU (and wordF) scores reported at http://workbench.qtleap.eu/evaluation/ agree exactly with the scores reported at http://qtleap.eu/wp-content/uploads/2015/04/QTLEAP-2015-D2.4.pdf#page=36

ondrejklejch commented 9 years ago

I think, that I fixed the problem. I compared the BLEU scores with compute_bleu_for_all_pilots.pl and everything is correct. The problem was in normalization, tokenization and UTF-8 support. Please see ba51d3b38f24769ac49b7b18f2be56aaecd7a1ac and 417c7d3f412b119ec25387977a2f95868d1e9f64.

I also changed smoothing in precision and recall computation to be compatible with mteval-13a.pl. Please see 63cf3dfe8f2475191429d31b0f3101d6a4bdaffd.

martinpopel commented 9 years ago

Great. Thanks. I am about to close this issue, just waiting while @lefterav updates http://workbench.qtleap.eu/evaluation/. (GitHub does not allow to assign an issue to multiple people, as you can see from my unsuccessful attempts:-).)

martinpopel commented 9 years ago

The BLEU scores seem to be correct, but the original BLEU (mteval-13a.pl without -c switch) is case insensitive, while in MT-CompareEval BLEU means case sensitive BLEU. The alternative variant is here called BLEU-cis and it is shown only in the Statistics tab. I think we will need to change this (but that's not related to this issue).

martinpopel commented 8 years ago

Closed with #51