Closed takashin3391 closed 2 years ago
I input the original input
from the Real Demo for Ted Talk
on the following sample page into the trained model.
https://daps.cs.princeton.edu/projects/HiFi-GAN/index.php?env-pairs=VCTK&speaker=p232&src-env=all
However, the output data did not result in the HiFi-GAN enhanced result
on the sample page.
The initial clapping can be heard. Also, there is a jittery sound mixed in.
Is the model used in the sample page different from the publicly available trained model? Why are the results different?
This repo is devoted to the model for speech synthesis from the paper HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. The examples you referred to are for model for denoising from the completely different paper HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks.
It's two separate models which got the same name hifi-gan by coincidence.
Thank you very much for your answer. I understand that the sample page I showed is different from this repository.
Thank you for your support.
Do the sample voices on the following pages (e.g., Real Demo for Ted Talk) use any of the publicly available trained models? https://daps.cs.princeton.edu/projects/HiFi-GAN/index.php?env-pairs=VCTK&speaker=p257&src-env=all
I inputted reverberant speech into the training model, but the sound quality was not as good as the demo speech. There is jittery noise in the mix. What could be the cause of this?