Open jonathan-cohen-nvidia opened 4 years ago
I believe 3003 should be allowable this round, let's discuss.
SWG:
We will merge this PR and handle in review the situation for people who didn't use the 3003 evaluation set. Review committee will handle this if there is confusion here.
If you are noticing issues with your transformer converge, you may have the wrong evaluation dataset. Please do submit your transformer this round if you used the wrong evaluation set and we'll resolve this during review.
We believe the Transformer reference implementation points to an incorrect eval dataset here: https://github.com/mlperf/training/tree/master/translation#quality-metric
This eval data set has 2737 lines.
In 0.6, both NVIDIA & Google, used the eval dataset with 3003 lines here: http://www.statmt.org/wmt14/test-full.tgz
It appears that this was never updated in the reference.
We submitted a fix here: https://github.com/mlperf/training/pull/392/files