yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.95k stars 417 forks source link

Help Wanted For Stage-1 #239

Open xujzouyyz opened 6 months ago

xujzouyyz commented 6 months ago

I tried to train the first stage using the LJSpeech dataset provided by developer, with the Config file set as default. However, mel loss decreases to 0.5 and becomes NaN after 25 epochs. How does this happen? 1716028326113

Karesto commented 5 months ago

after 25 epochs ?

What's your batch size/data ? There's a possibility that you start the TMA stage of the training ? (it should be in your config file).

kushbhatia commented 4 months ago

I am facing a similar issue. I am trying to reproduce the results of the paper and training on LJSpeech with a single GPU. As soon as the training starts the TMA stage, within 1-2 epochs the Gen and Dis loss start blowing up and eventually they NaN. I am using a batchsize of 16 and a learning rate of 1e-4. This is in the first stage of training.

Can you let me know how to stabilize this part of the training?

martinambrus commented 2 months ago

Perhaps issue https://github.com/yl4579/StyleTTS2/issues/254 as well as its connected PR https://github.com/yl4579/StyleTTS2/pull/253 could solve this - it did solve NaN value errors for me, although it was for 2nd stage training on a single GPU.