Correction to Transformer eval set

mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes

https://mlcommons.org/en/groups/training

Apache License 2.0

93 stars 66 forks source link

Correction to Transformer eval set #364

Open jonathan-cohen-nvidia opened 4 years ago

jonathan-cohen-nvidia commented 4 years ago

We believe the Transformer reference implementation points to an incorrect eval dataset here: https://github.com/mlperf/training/tree/master/translation#quality-metric

This eval data set has 2737 lines.

In 0.6, both NVIDIA & Google, used the eval dataset with 3003 lines here: http://www.statmt.org/wmt14/test-full.tgz

It appears that this was never updated in the reference.

We submitted a fix here: https://github.com/mlperf/training/pull/392/files

bitfort commented 4 years ago

I believe 3003 should be allowable this round, let's discuss.

bitfort commented 4 years ago

SWG:

We will merge this PR and handle in review the situation for people who didn't use the 3003 evaluation set. Review committee will handle this if there is confusion here.

If you are noticing issues with your transformer converge, you may have the wrong evaluation dataset. Please do submit your transformer this round if you used the wrong evaluation set and we'll resolve this during review.