p0p4k / pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper
https://neurips.cc/virtual/2023/poster/69899
MIT License
198 stars 28 forks source link

NAN loss #31

Open 0913ktg opened 4 months ago

0913ktg commented 4 months ago

Hello p0p4k,

I've begun training a PFlow Korean model using the code you shared. However, I encountered a nan loss during the training process. I used a publicly available Korean dataset and structured the filelist in a single speaker format with filename|text.

Although the dataset contains over 2000 speakers, it lacks speaker labels, so I trained it using a single-speaker setting. I understand that differences in data and preprocessing methods might lead to various issues, but if you have any insights into the potential causes of nan loss, I would greatly appreciate your advice.

It's snowing heavily in Korea right now. Have a great day.

image

At first, learning seems to be going well, but then suddenly something goes wrong.

image image image

0913ktg commented 4 months ago

The training environment used cuda11.8, pytorch 2.1.2, torchvision 2.1.2, torchvision 0.16.2, and DDP training using four NVIDIA A100-SXM4 (80G) cards. The data used 253K audio-text pairs 256 batch size, and the text was phonemically converted using the Korean phoneme conversion module G2PK. We are currently in the process of retraining the model by reducing the batch size to 64.

p0p4k commented 4 months ago

KSS dataset? 오늘 눈이 많아서 조심하세요!

p0p4k commented 4 months ago

Ah, it is not KSS dataset, multispeaker dataset! Maybe there is too much variance, can you try to take small subset of 3-4 speakers and train that first?

p0p4k commented 4 months ago

For me I got Nan loss sometimes cause of dataset issue.

0913ktg commented 4 months ago

After changing the batch size to 64, the model is not showing any nan_loss. I will continue to monitor and share the results.

Additionally, there is a part where the original mel-spectrogram is added to tensorboard with add_image without removing zero-padding. It would be beneficial to add code that removes zero-padding using y_lengths of the batch.

Lastly, while it was observed that the GPU usage was at 100% with the vits2 repo by p0p4k, it seems that this repo is not utilizing the GPU as efficiently.

I wanted to inquire if there are any ongoing developments related to this.

Thank you always for your prompt response.

p0p4k commented 4 months ago

About gpu usage, it might be because of dataloader. We might have to investigate that. Keep me updated with samples. Good day!

Tera2Space commented 4 months ago

Try to disable fp16 and use fp32

matteotesta commented 4 months ago

That is due to the matmul of query and key going overflow with float16. You can find a solution to that problem in Sec. 2.4 in this paper (https://arxiv.org/pdf/2105.13290.pdf) see eq. 4.