yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.4k stars 342 forks source link

train_second.py model.decoder error (output tensor is nan) #193

Open junylee11 opened 5 months ago

junylee11 commented 5 months ago

The g_loss value in "train_second.py" is nan. Debugging found that the output value of the model.decoder() function was nan. (line 391, line 402) There was no problem in train_first.py, but I don't know why this problem occurs in train_second.py.

If you can fix these errors, please help me. Thank you.

image image

logdir: "C:\Users\user\Desktop\styleTTS2_test_data" first_stage_path: "first_stage.pth" save_freq: 2 log_interval: 10 device: "cuda" epochs_1st: 200 # number of epochs for first stage training (pre-training) epochs_2nd: 100 # number of peochs for second stage training (joint training) batch_size: 4 max_len: 200 # maximum number of frames pretrainedmodel: "C:\Users\user\Desktop\styleTTS2_test_data\epoch_1st_00170.pth" second_stage_load_pretrained: true # set to true if the pre-trained model is for 2nd stage load_only_params: true # set to true if do not want to load epoch numbers and optimizer parameters

config for decoder

decoder: type: 'istftnet' # either hifigan or istftnet resblock_kernel_sizes: [3,7,11] upsample_rates : [10, 6] upsample_initial_channel: 512 resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]] upsample_kernel_sizes: [20, 12] gen_istft_n_fft: 20 gen_istft_hop_size: 5

akshatgarg99 commented 5 months ago

Same issue

effusiveperiscope commented 5 months ago

I have experienced this before in a few situations:

yl4579 commented 4 months ago

Have you checked whether F0_fake, N_fake, s or en are all not NaN?

suryasubbu commented 1 month ago

Have you checked whether F0_fake, N_fake, s or en are all not NaN?

All the above are not NAN The problem starts with model.decoder() process where y_rec_gt_pred becomes nan even though the arguments are not nan.