I'm using the command as specified in the README (python train_finetune.py --config_path ./Configs/config_top19.yml), and got this error (which seems related to the DDP bug?):
Epochs: 12
Validation loss: 0.341, Dur loss: 0.620, F0 loss: 3.076
Saving..
Traceback (most recent call last):
File "/home/ck/git/StyleTTS2/train_finetune.py", line 714, in <module>
main()
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/ck/git/StyleTTS2/train_finetune.py", line 509, in main
loss_gen_lm.backward()
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:
torch.cuda.FloatTensor [1, 1, 1]] is at version 3; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
The only changes made to the config shouldn't be significant:
I'm using the command as specified in the README (
python train_finetune.py --config_path ./Configs/config_top19.yml
), and got this error (which seems related to the DDP bug?):The only changes made to the config shouldn't be significant:
Since the arose at the 12th epoch, wonder if it has something to do with joint training.