tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.35k stars 1.96k forks source link

Smooth-BLEU bug fixed #488

Open Aktsvigun opened 2 years ago

Aktsvigun commented 2 years ago

Hi, the current implementation of smooth-BLEU contains a bug: it smoothes unigrams as well. Consequently, when both the reference and translation consist of totally different tokens, it anyway returns a non-zero value (please see the attached image).

This however contradicts the source paper suggesting the smooth-BLEU (Chin-Yew Lin, Franz Josef Och. ORANGE: a method for evaluating automatic evaluation metrics for machine translation. COLING 2004.) :

Add one count to the n-gram hit and total ngram count for n > 1. Therefore, for candidate translations with less than n words, they can still get a positive smoothed BLEU score from shorter n-gram matches; however if nothing matches then they will get zero scores.

This pull request aims at fixing this bug.

Снимок экрана 2022-06-29 в 17 39 42
google-cla[bot] commented 2 years ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.