sh-lee-prml / HierSpeechpp

The official implementation of HierSpeech++
MIT License
1.13k stars 134 forks source link

why you decrease kl_loss weight from 1.0 (vits) to 0.1 ? #25

Closed segmentationFaults closed 5 months ago

segmentationFaults commented 5 months ago

Can you share your insight?

Gabibing commented 5 months ago

I'm not the author, but in my opinion, when the difference between the distributions is significant, the KL loss can be higher. Attempting to make the distributions identical by assigning a high weight could potentially lead to unfavorable outcomes. edit) So, our goal is not to make the kl loss 0, but to maintain an ideal divergence between the two distributions through appropriate weighting.

sh-lee-prml commented 5 months ago

Hi! @segmentationFaults

As @Gabibing said, we observed that the model could not reconstruct the wav2vec representation with the weight of 1. (However, this model also could synthesize the speech from text!)

However, I acknowledge that using the weight of 1 is natural in terms of Normalizing Flow and Text Encoder (The weights of both modules are optimized only with kl loss.

In my case, I hope to improve the reconstruction quality so I simply decrease the weight of kl loss.

For LibriTTS dataset, we found that using the weight of 0.1 has a better performance than that of 1 and 0.5.

Thanks!

and Thanks @Gabibing

segmentationFaults commented 5 months ago

OK, thanks