Closed erogol closed 4 years ago
I had a go at this, in this branch here.
There appears to be a problem with the pytorch 1.6.0 support for AMP with regard to RNNs; this PR is underway to fix. The issue is recorded here. I hit this problem when I try to train; stack trace shown below. I suggest we wait until this fix makes it into a pytorch release before resuming this effort.
> TRAINING (2020-08-07 00:12:02)
! Run is removed from /mnt/NovoMass/Data/models/ljspeech_models/dev_taco1/ljspeech-ddc-bn-August-07-2020_12+12AM-0000000
Traceback (most recent call last):
File "/home/jim/dev/models/tensorflow/TTS/TTS/bin/train_tts.py", line 677, in <module>
main(args)
File "/home/jim/dev/models/tensorflow/TTS/TTS/bin/train_tts.py", line 588, in main
train_avg_loss_dict, global_step = train(model, criterion, optimizer,
File "/home/jim/dev/models/tensorflow/TTS/TTS/bin/train_tts.py", line 153, in train
decoder_output, postnet_output, alignments, stop_tokens, decoder_backward_output, alignments_backward = model(
File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jim/dev/models/tensorflow/TTS/TTS/tts/models/tacotron.py", line 104, in forward
encoder_outputs = self.encoder(inputs)
File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jim/dev/models/tensorflow/TTS/TTS/tts/layers/tacotron.py", line 244, in forward
outputs = self.cbhg(outputs.transpose(1, 2))
File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jim/dev/models/tensorflow/TTS/TTS/tts/layers/tacotron.py", line 223, in forward
return self.cbhg(x)
File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jim/dev/models/tensorflow/TTS/TTS/tts/layers/tacotron.py", line 204, in forward
outputs, _ = self.gru(x)
File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 734, in forward
result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts
We just merged APEX amp under dev branch but as Pytorch released a native support, it is better off to change the implementation. https://pytorch.org/blog/pytorch-1.6-released/