stefan-it / nmt-en-vi

Neural Machine Translation system for English to Vietnamese (IWSLT'15 English-Vietnamese data)
57 stars 14 forks source link

Bleu Score to Use #8

Open kaeflint opened 4 years ago

kaeflint commented 4 years ago

Thank you for the dataset. I am very novice at NMT, the tensor2tensor's bleu score differs from that obtained by using the 'multi-bleu.perl'. Example t2t-bleu reports: 28.13 (uncase), 27.391 (cased) but multi-bleu reports 31.1(uncase) and 30.3(cased). Which one is best for report?

andmek commented 4 years ago

When calculating the BLEU-score with the built-in t2t-bleu tool, the test data sets should be detokenized. In that case, the score will even be lower than the reported 28.54 (cased). I got a result of 27.5 (cased) using sacrebleu (https://github.com/mjpost/sacreBLEU) with signature BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a+version.1.4.9.