RuntimeError: stack expects a non-empty TensorList

Charlottecuc commented 3 years ago

Hi. Thank you very much for your implementation. I tried to extract the duration by using the default configs (the only difference is that a different dataset is used). However, after 9 iterations, the following error occurred:

  File "code/duration_extractor.py", line 539, in <module>
    logdir=logdir
  File "code/duration_extractor.py", line 390, in fit
    valid_losses = self._validate(valid_loader)
  File "code/duration_extractor.py", line 465, in _validate
    sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
  File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 119, in spec2wav
    magnitudes = self.mel2linear(magnitudes)
  File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 137, in mel2linear
    return nnls(self.mel_basis, mel)
  File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 46, in nnls
    torch.nn.utils.clip_grad_norm_(X, 1)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/utils/clip_grad.py", line 30, in clip_grad_norm_
    total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

Could you help me to sovle this problem? Thank you ~

janvainer2 commented 3 years ago

Hi thanks for your interest in this repo. Could you try if you are able to extract the durations for the default LJSpeech dataset? Could you please try to print how the inputs to the nnls function look like? (just add print in your repo local copy). Also what checkpoint did you use for the duration extractor? Did you train your own, or did you use the default provided with this project?

adnan-mehremic commented 3 years ago

I had the same error after I run this command python code/duration_extractor.py

Traceback (most recent call last):
  File "code/duration_extractor.py", line 534, in <module>
    logdir=logdir
  File "code/duration_extractor.py", line 390, in fit
    valid_losses = self._validate(valid_loader)
  File "code/duration_extractor.py", line 461, in _validate
    sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
  File "/home/ubuntu/speedyspeech/code/stft.py", line 119, in spec2wav
    magnitudes = self.mel2linear(magnitudes)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 137, in mel2linear
    return nnls(self.mel_basis, mel)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 46, in nnls
    torch.nn.utils.clip_grad_norm_(X, 1)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/utils/clip_g                                       rad.py", line 30, in clip_grad_norm_
    total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type)                                        for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

janvainer2 commented 3 years ago

Are you training on GPU or CPU? I will need more information to reproduce the error..

adnan-mehremic commented 3 years ago

Ok, I tried a few times, and always got the same error. I followed all your steps, and after running this command python code/duration_extractor.py, I got this error (as you can see model sent to cuda)

ubuntu@ip-172-31-68-24:~/speedyspeech$ python code/duration_extractor.py
Model sent to cuda
13000/13000: [===============================>] - ETA 1.6sss
Epoch 1 | Train - l1: 0.09392118094296291, guided_att: 0.00031112270836037095| V                  alid - l1: 0.3166225552558899, guided_att: 0.0004626042937161401|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 2 | Train - l1: 0.06905996212231115, guided_att: 0.0002700031827905862| Va                  lid - l1: 0.3054344058036804, guided_att: 0.00043494933925103396|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 3 | Train - l1: 0.06594225224749796, guided_att: 0.00026452819020498836| V                  alid - l1: 0.32097506523132324, guided_att: 0.00046123971696943045|
13000/13000: [===============================>] - ETA 1.1sss
Epoch 4 | Train - l1: 0.06372856097341759, guided_att: 0.0002559272787021014| Va                  lid - l1: 0.32438914477825165, guided_att: 0.00048450268513988703|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 5 | Train - l1: 0.06199859332274921, guided_att: 0.0002551149550952669| Va                  lid - l1: 0.3171471357345581, guided_att: 0.0004896632890449837|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 6 | Train - l1: 0.06050542716322274, guided_att: 0.0002568568125380928| Va                  lid - l1: 0.2853122800588608, guided_att: 0.00046930725511629134|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 7 | Train - l1: 0.05929661129275566, guided_att: 0.0002494556063744859| Va                  lid - l1: 0.25290364027023315, guided_att: 0.0005208489892538637|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 8 | Train - l1: 0.05856953240160284, guided_att: 0.00024662175923448256| V                  alid - l1: 0.39512471854686737, guided_att: 0.0008473480411339551|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 9 | Train - l1: 0.05783513459959641, guided_att: 0.00024235204981612455| V                  alid - l1: 0.32342180609703064, guided_att: 0.0010448592656757683|
13000/13000: [===============================>] - ETA 1.0sss
Traceback (most recent call last):
  File "code/duration_extractor.py", line 534, in <module>
    logdir=logdir
  File "code/duration_extractor.py", line 390, in fit
    valid_losses = self._validate(valid_loader)
  File "code/duration_extractor.py", line 461, in _validate
    sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
  File "/home/ubuntu/speedyspeech/code/stft.py", line 119, in spec2wav
    magnitudes = self.mel2linear(magnitudes)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 137, in mel2linear
    return nnls(self.mel_basis, mel)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 46, in nnls
    torch.nn.utils.clip_grad_norm_(X, 1)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/utils/clip_g                  rad.py", line 30, in clip_grad_norm_
    total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type)                   for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

janvainer commented 3 years ago

@adnan-mehremic Thanks for the info, I will try to replicate this during the weekend

dsplog commented 3 years ago

@janvainer : as seen from https://github.com/pytorch/pytorch/issues/38605, moved to torch==1.5.1 and the issue is not seen. anyhow, have to read up to understand what is going on.

janvainer commented 3 years ago

Thanks for the link. My problem with this issue is that I am not able to reproduce this even with a clean setup and reinstalled dependencies and everything works even with torch==1.5.0. What might be a problem is that the requirements installation failed last time I tried and I had to install numpy and some other numeric packages separately. Could you please check that your installed dependencies are exactly the same like in requirements? Or just post it here and I will check. There is possibly some dependency version conflict that may arise when the packages are installed at once.

pmunaretto commented 3 years ago

Thank you for awesome project! I had the same problem training the model for another language and moving to torch==1.5.1 fixed the problem for me. All the packages were matching the ones in the requirements.

Here is some info on the tensors from the nnls function:

mel_basis:  torch.Tensor of size [80, 513]
 mel_spec:  torch.Tensor of size [1, 80, 1128]
        X:  torch.Tensor of size [1, 513, 1128]

In both torch versions the tensors are the same. However, with 1.5.0 torch.nn.utils.clip_gradnorm seems to fail with the error mentioned above.

janvainer commented 3 years ago

Thanks for trying this out! I will check if version 1.5.1 works for me and will bump up the requirement.

DanielJean007 commented 2 years ago

Hi all.

Just to report. I had the same problem. I updated to tourch==1.5.1. Indeed, it solved the problem. Although from another project I saw another solution: https://github.com/audio-captioning/dcase-2020-baseline/issues/7. The solution was to run the gradient backward before the gradient clip. I notice that you have done the same: first clip, then backward. Perhaps, changing these call orders could solve this problem for good?

janvainer / speedyspeech

RuntimeError: stack expects a non-empty TensorList #18