yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.98k stars 422 forks source link

`g_loss` is NaN cause of model.predictor_encoder and model.decoder #284

Closed xorium closed 2 months ago

xorium commented 2 months ago

Hi! Thank you for you work!

I'm trying to run second stage training (single speaker, non-english language) and already in epoch 0, the code hits the line set_trace(). I checked the recommendations from here, and it seems that none of the points apply (I’m definitely using multilingual-PL-BERT, it’s epoch 0, I haven’t changed the code, and the training in the first stage went without any errors).

So I tried debugging the code to see the reason, and I noticed that everything starts with model.predictor_encoder and model.decoder returning NaN tensors with non-NaN input data.

I don't have enough knowledge to understand the cause further. Could you at least guide me on where I should look next? Thank you!

martinambrus commented 2 months ago

try checking this issue along with its pull request to see if it helps - https://github.com/yl4579/StyleTTS2/issues/254

xorium commented 2 months ago

Thank you! This really helps!