mozilla / TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Mozilla Public License 2.0
9.34k stars 1.25k forks source link

Support for Native Pytorch AMP #486

Closed erogol closed 4 years ago

erogol commented 4 years ago

We just merged APEX amp under dev branch but as Pytorch released a native support, it is better off to change the implementation. https://pytorch.org/blog/pytorch-1.6-released/

jyegerlehner commented 4 years ago

I had a go at this, in this branch here.

There appears to be a problem with the pytorch 1.6.0 support for AMP with regard to RNNs; this PR is underway to fix. The issue is recorded here. I hit this problem when I try to train; stack trace shown below. I suggest we wait until this fix makes it into a pytorch release before resuming this effort.

> TRAINING (2020-08-07 00:12:02) 
 ! Run is removed from /mnt/NovoMass/Data/models/ljspeech_models/dev_taco1/ljspeech-ddc-bn-August-07-2020_12+12AM-0000000
Traceback (most recent call last):
  File "/home/jim/dev/models/tensorflow/TTS/TTS/bin/train_tts.py", line 677, in <module>
    main(args)
  File "/home/jim/dev/models/tensorflow/TTS/TTS/bin/train_tts.py", line 588, in main
    train_avg_loss_dict, global_step = train(model, criterion, optimizer,
  File "/home/jim/dev/models/tensorflow/TTS/TTS/bin/train_tts.py", line 153, in train
    decoder_output, postnet_output, alignments, stop_tokens, decoder_backward_output, alignments_backward = model(
  File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jim/dev/models/tensorflow/TTS/TTS/tts/models/tacotron.py", line 104, in forward
    encoder_outputs = self.encoder(inputs)
  File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jim/dev/models/tensorflow/TTS/TTS/tts/layers/tacotron.py", line 244, in forward
    outputs = self.cbhg(outputs.transpose(1, 2))
  File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jim/dev/models/tensorflow/TTS/TTS/tts/layers/tacotron.py", line 223, in forward
    return self.cbhg(x)
  File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jim/dev/models/tensorflow/TTS/TTS/tts/layers/tacotron.py", line 204, in forward
    outputs, _ = self.gru(x)
  File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jim/anaconda3/envs/tts/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 734, in forward
    result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts