mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.58k stars 549 forks source link

[RNN-T] SpectAugment masks the last row/column #493

Closed mwawrzos closed 2 years ago

mwawrzos commented 3 years ago

fixes #492

github-actions[bot] commented 3 years ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

qpjaada commented 3 years ago

Hi @mwawrzos - did you check convergence runs with this change? Were they about the same as before?

mwawrzos commented 3 years ago

Hello @qpjaada - here are numbers with the fixes in spectrogram augmentation and without bucketing (https://github.com/mlcommons/training_policies/pull/453), batch size 2048: 57, 59, 56, 59, 59, 58, 54, 59, 58, 56, 55, 57, 56, 54, 56, 56, 59, 52, 57, 55 average: 56.6, stdev: 1.984147702

For the same batch size, and before the fix, numbers look like this: 56, 56, 54, 56, 57, 58, 55, 56, 60, 56, 58, 57, 53, 60, 57, 57, 55, 57, 56, 56 average: 56.5, stdev: 1.701392618

On a histogram, this looks like this: RNN-T spectrogram augmentation fix

qpjaada commented 3 years ago

Thanks @mwawrzos. Seems like convergence is a bit better after fix. Although based on your numbers, I am getting:

with fix: average: 56.6, stdev: 1.9339079605813716

before fix: average: 58.5, stdev: 1.6278820596099706

mwawrzos commented 3 years ago

Thanks, I fixed my previous comment.