huawei-noah / Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
545 stars 112 forks source link

Generated outputs sound robotic in some cases! #14

Open aniketp02 opened 2 years ago

aniketp02 commented 2 years ago

Hello!

I have trained GradTTS on quite a few datasets and observed that it produces excellent results on most of the voices, but the generated results on heavy voices sound very robotic! Can anyone explain any possible reasons for this?

I have attached the samples of original and generated voices.

Thank you!

https://user-images.githubusercontent.com/81474354/172867498-0d368d4f-06e5-418f-b4d5-2fb2014af19c.mov

https://user-images.githubusercontent.com/81474354/172867486-0e1a491d-1654-4aa2-847e-ef4fcd7f29d0.mov

https://user-images.githubusercontent.com/81474354/172867501-243cfe51-dbc8-4461-ad61-a93eb6a295e0.mov

https://user-images.githubusercontent.com/81474354/172867473-41b32c2b-06d3-4aeb-8b50-65c994bd83cc.mov

ivanvovk commented 2 years ago

@aniketp02 Hey! Which HiFi-GAN version do you use? Checkpoint we provide in this repo is trained on LJSpeech only.

aniketp02 commented 2 years ago

@ivanvovk Thanks for pointing that out; I was indeed using the checkpoint provided in this repo. Using the Universal HiFi-GAN checkpoint certainly improves the results!!

https://user-images.githubusercontent.com/81474354/173000437-0ded9b6a-4d3a-4398-8399-560e17b5334d.mov

ivanvovk commented 2 years ago

@aniketp02 glad to hear it helped! For even better quality you can finetune HiFi-GAN on outputs of Grad-TTS.