CUDNN warnings when BatchNormalization is used

The warning is a problem when full training is running, as the log is getting huuuge (400 MB instead of 2MB)

The warning is proposing a fix: To compact weights again call flatten_parameters(). I am not sure, where to call it yet.

Full warning below:

warning:
../aten/src/ATen/native/cudnn/RNN.cpp:1278: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().

Repro for commit https://github.com/ryanleary/mlperf-rnnt-ref/commit/4082f086ec4834886cceb927dbb1454eca44c68d:

train.py --batch_size=16 --eval_batch_size=4 --num_epochs=1000 --output_dir=/results --model_toml=configs/rnnt_bn.toml --lr=0.02 --seed=6 --optimizer=novograd --dataset_dir=/datasets/LibriSpeech --val_manifest=/datasets/LibriSpeech/librispeech-dev-clean-wav.json --train_manifest=/datasets/LibriSpeech/librispeech-train-clean-100-wav.json,/datasets/LibriSpeech/librispeech-train-clean-360-wav.json,/datasets/LibriSpeech/librispeech-train-other-500-wav.json --weight_decay=0.001 --save_freq=10 --eval_freq=1000 --train_freq=25 --gradient_accumulation_steps=4 --fp16 --cudnn

ryanleary / mlperf-rnnt-ref

CUDNN warnings when BatchNormalization is used #6