winddori2002 / DEX-TTS

DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
MIT License
94 stars 7 forks source link

Question about Early Overfitting in GeDEX-TTS Training with VCTK #5

Open thunn opened 2 months ago

thunn commented 2 months ago

Hello,

I recently ran the GeDEX-TTS training using the VCTK dataset and followed the instructions provided in the GeDEX-TTS/config/VCTK/base.yaml configuration file. The only modifications I made were:

While plotting the training logs, I observed that the model started to overfit very early in the training process. This is somewhat unexpected to me, and I wanted to inquire whether this behaviour is expected or if there might be an issue with my setup.

Here is a basic plot of the training losses.

GeDex-TTS on VCTK

Could you please provide insights into whether early overfitting is a known issue with this configuration, or if there are any recommended adjustments to prevent this?

Thank you for your help!

winddori2002 commented 2 months ago

Hi,

Have you listened to the synthesized samples?

The duration loss can increase at the validation phase.

Although the diffusion loss does not improve a lot in training phase, more steps usually guarantee better sample quality as in Grad-TTS.

Thanks.