keonlee9420 / DiffGAN-TTS

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
MIT License
320 stars 44 forks source link

Why minmize l1(\hat{x_0}, x_0)+l1(\hat{x_1}, x_0) when optimizing aux model? #14

Open caisikai opened 2 years ago

caisikai commented 2 years ago

Hi, keonlee. Thanks for sharing code! I found that when training aux model, we get \hat{x_0} from G, then diffuse it to \hat{x_1}, finally get a prediciton list [ \hat{x_0}, \hat{x_1}]. When calculating mel loss, add l1 loss of them with target. It confuse me. I understand l1(x_0, \hat{x_0}). But why not l1(x_1, \hat{x_1}).