yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.95k stars 417 forks source link

Finetune also suffers "one of the variables needed for gradient computation has been modified by an inplace operation" #73

Closed kmn1024 closed 12 months ago

kmn1024 commented 12 months ago

I'm using the command as specified in the README (python train_finetune.py --config_path ./Configs/config_top19.yml), and got this error (which seems related to the DDP bug?):

Epochs: 12
Validation loss: 0.341, Dur loss: 0.620, F0 loss: 3.076

Saving..
Traceback (most recent call last):
  File "/home/ck/git/StyleTTS2/train_finetune.py", line 714, in <module>
    main()
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ck/git/StyleTTS2/train_finetune.py", line 509, in main
    loss_gen_lm.backward()
  File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:
torch.cuda.FloatTensor [1, 1, 1]] is at version 3; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The only changes made to the config shouldn't be significant:

...
5,6c5,6                                                                 
< epochs: 20
< batch_size: 16
---
> epochs: 50 # number of finetuning epoch (1 hour of data)
> batch_size: 8
95,96c95,96
<     diff_epoch: 4 # style diffusion starting epoch
<     joint_epoch: 12 # joint training starting epoch
---
>     diff_epoch: 10 # style diffusion starting epoch
>     joint_epoch: 30 # joint training starting epoch
...

Since the arose at the 12th epoch, wonder if it has something to do with joint training.

kmn1024 commented 12 months ago

Duplicate issue: https://github.com/yl4579/StyleTTS2/issues/72