yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.79k stars 391 forks source link

Trying to fine tune with SLM adversarial training #99

Closed jazza420 closed 10 months ago

jazza420 commented 10 months ago

training starts off fine but when I get to the joint_epoch i get the following issue:

Traceback (most recent call last): File "/workspace/StyleTTS2/train_finetune.py", line 707, in main() File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) File "/workspace/StyleTTS2/train_finetune.py", line 493, in main ref_lengths, use_ind, s_trg.detach(), ref if multispeaker else None) UnboundLocalError: local variable 'ref' referenced before assignment

can anyone help me? am i missing something? thanks

yl4579 commented 10 months ago

Your diff_epoch is bigger than joint_epoch.

GUUser91 commented 4 months ago

@jazza420 I found a workaround by setting multispeaker to false.

@yl4579 Is setting multispeaker to false a bad idea if I try to fine tune with SLM adversarial training on a diffusion trained model as the base model?