Questions about MPD's ability to enhance MOS and contribution of feature-matching loss

jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

MIT License

1.92k stars 506 forks source link

Questions about MPD's ability to enhance MOS and contribution of feature-matching loss #64

Closed francislata closed 3 years ago

francislata commented 3 years ago

@jik876 This is more of a high-level question.

1) Table 4 of section 4.2 shows the application of MPD on MelGAN. I understand that HiFi-GAN and MelGAN uses MSD with almost the same settings. Have you done experiments with other GAN-based vocoders applying MPD and showed improvements in perceptual quality?

2) The feature-matching loss is used in the MSD by HiFi-GAN as well as MelGAN. Did you do any experiments with and without this loss and how it affects the perpetual quality?

Thank you!

jik876 commented 3 years ago

Thank you for your interest.

We didn't conduct experiment with other models. Since the MPD takes raw waveform as the input and does not have any coupling with generator architecture, it will not be complicated to experiment with applying it to other models.
In our experiments, the feature matching loss affected perceptual quality. However, the quality degradation was different depending on datasets. Since the feature matching loss makes computation of backward operation heavy, it will be a good idea to experiment without it.