An evalutation WER is higher than a Training WER, even if train on eval DS

mwawrzos commented 4 years ago

Possible reasons:

in the experiment, the evaluation pipeline had slight differences: a. train pipeline was filtering long sentences, while eval was not - may have a significant influence; b. train pipeline was using SpecAugment, while eval was not - should not influence eval;
bug in eval.

ryanleary commented 4 years ago

Has anyone tracked this down yet?

mwawrzos commented 4 years ago

I rerun the evaluation on the filtered dataset. WER dropped to 0.0

ryanleary / mlperf-rnnt-ref