tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.35k stars 1.96k forks source link

Potential error in BLEU score computation #450

Open sluks opened 4 years ago

sluks commented 4 years ago

Code in question is here: https://github.com/tensorflow/nmt/blob/master/nmt/scripts/bleu.py#L70

When computing the reference length, the current version combines the minimum length of the references. I'm not sure if this was done on purpose, but the original paper (section 2.2.2) suggests this should be the length of the reference translation that is closest to the length of the candidate translation.

An example:

candidate_corpus = [['I', 'ate', 'an', 'apple']]
references_corpus = [[['Bye'], ['I', 'ate', 'an', 'apple', 'here']]]
compute_bleu(candidate_corpus, references_corpus, max_order=4, smooth=False)

The ouput of this is (1.0, [1.0, 1.0, 1.0, 1.0], 1.0, 4.0, 4, 1).

Specifically, the reference length (last number of the output) is 1, because ['Bye'] is one of the references. This should be 5 since the other reference is closest in length to the candidate.