jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
1.92k stars 506 forks source link

mel spectrum loss VS stft loss #31

Closed nukes closed 3 years ago

nukes commented 3 years ago

Hi, thanks for your great work. In paper you mentioned that using the mel spectrum loss to get a more stable and efficient training. The multi-resolution STFT loss used in parallel-wavegan/mb-megan seems can achieve the same goal. My question is have you tried the stft loss instead the mel loss? did you observe the differences(if you tried)?

Thanks.

jik876 commented 3 years ago

Thanks for your interest. We experimented with STFT loss and observed that the quality was worse than with the mel-spectrogram loss in the early stage.

jik876 commented 3 years ago

I close this as there are no recent updates. Please reopen if you need additional comments.