When running vocoder training it fails with (whole Traceback at the end):
File ".../lib/python3.6/site-packages/torch/functional.py", line 516, in stft
normalized, onesided, return_complex)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
I've located the problem in the TorchSTFT class in TTS/vocoder/losses.py line 7:
The problem is with self.window which doesn't get transferred to CUDA when the loss function gets transferred via criterion_gen.cuda() in the TTS/bin/train_vocoder_gan.py line 536.
I've managed to solve this by subclassing torch.nn.Module and listing self.window as a paramter. This way the .cuda() will transfer the window to cuda and stft will work:
In it I've also added the parameter return_complex=False in torch.stft because of the reported future change in the default behaviour.
I'm wandering if torch just didn't report this in the previous version, and it just went ahead transffering to and from the cpu/gpu. In that sense this could speed up vocoder model training :thinking:
$ python TTS/bin/train_vocoder_gan.py --config_path TTS/vocoder/configs/my_parallel_wavegan_config.json
> Using CUDA: True
> Number of GPUs: 1
> Git Hash: 7beaacc
> Experiment folder: /home/vibe/tts/mozilla/Models/LJSpeech/pwgan-January-16-2021_11+48AM-7beaacc
> Loading wavs from: /home/vibe/tts/databases/LJSpeech-1.1/wavs/
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:0
| > fft_size:1024
| > power:None
| > preemphasis:0.0
| > griffin_lim_iters:None
| > signal_norm:True
| > symmetric_norm:True
| > mel_fmin:50.0
| > mel_fmax:7600.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60
| > do_sound_norm:False
| > stats_path:/home/vibe/tts/databases/LJSpeech-1.1/scale_stats.npy
| > hop_length:256
| > win_length:1024
> Generator Model: parallel_wavegan_generator
> Discriminator Model: parallel_wavegan_discriminator
> Generator has 1320442 parameters
> Discriminator has 99842 parameters
> EPOCH: 0/10000
> TRAINING (2021-01-16 11:48:49)
/home/vibe/miniconda3/envs/tts/lib/python3.6/site-packages/torch/functional.py:516: UserWarning: stft will require the return_complex parameter be explicitly specified in a future PyTorch release. Use return_complex=False to preserve the current behavior or return_complex=True to return a complex output. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:653.)
normalized, onesided, return_complex)
! Run is removed from /home/vibe/tts/mozilla/Models/LJSpeech/pwgan-January-16-2021_11+48AM-7beaacc
Traceback (most recent call last):
File "TTS/bin/train_vocoder_gan.py", line 654, in <module>
main(args)
File "TTS/bin/train_vocoder_gan.py", line 559, in main
epoch)
File "TTS/bin/train_vocoder_gan.py", line 152, in train
feats_real, y_hat_sub, y_G_sub)
File "/home/vibe/miniconda3/envs/tts/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/vibe/tts/mozilla/TTS_gerazov/TTS/vocoder/layers/losses.py", line 233, in forward
stft_loss_mg, stft_loss_sc = self.stft_loss(y_hat.squeeze(1), y.squeeze(1))
File "/home/vibe/miniconda3/envs/tts/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/vibe/tts/mozilla/TTS_gerazov/TTS/vocoder/layers/losses.py", line 70, in forward
lm, lsc = f(y_hat, y)
File "/home/vibe/miniconda3/envs/tts/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/vibe/tts/mozilla/TTS_gerazov/TTS/vocoder/layers/losses.py", line 46, in forward
y_hat_M = self.stft(y_hat)
File "/home/vibe/tts/mozilla/TTS_gerazov/TTS/vocoder/layers/losses.py", line 25, in __call__
onesided=True)
File "/home/vibe/miniconda3/envs/tts/lib/python3.6/site-packages/torch/functional.py", line 516, in stft
normalized, onesided, return_complex)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
When running vocoder training it fails with (whole Traceback at the end):
I've located the problem in the
TorchSTFT
class inTTS/vocoder/losses.py
line 7:The problem is with
self.window
which doesn't get transferred to CUDA when the loss function gets transferred viacriterion_gen.cuda()
in theTTS/bin/train_vocoder_gan.py
line 536.I've managed to solve this by subclassing
torch.nn.Module
and listingself.window
as a paramter. This way the.cuda()
will transfer the window to cuda andstft
will work:Here's the PR #620
In it I've also added the parameter
return_complex=False
intorch.stft
because of the reported future change in the default behaviour.I'm wandering if
torch
just didn't report this in the previous version, and it just went ahead transffering to and from the cpu/gpu. In that sense this could speed up vocoder model training :thinking:This is my environment:
And here's the whole Traceback: