Generated outputs sound robotic in some cases!

aniketp02 commented 2 years ago

Hello!

I have trained GradTTS on quite a few datasets and observed that it produces excellent results on most of the voices, but the generated results on heavy voices sound very robotic! Can anyone explain any possible reasons for this?

I have attached the samples of original and generated voices.

Thank you!

Original male voice

https://user-images.githubusercontent.com/81474354/172867498-0d368d4f-06e5-418f-b4d5-2fb2014af19c.mov

Generated male voice

https://user-images.githubusercontent.com/81474354/172867486-0e1a491d-1654-4aa2-847e-ef4fcd7f29d0.mov

Original female voice

https://user-images.githubusercontent.com/81474354/172867501-243cfe51-dbc8-4461-ad61-a93eb6a295e0.mov

Generated female voice

https://user-images.githubusercontent.com/81474354/172867473-41b32c2b-06d3-4aeb-8b50-65c994bd83cc.mov

ivanvovk commented 2 years ago

@aniketp02 Hey! Which HiFi-GAN version do you use? Checkpoint we provide in this repo is trained on LJSpeech only.

aniketp02 commented 2 years ago

@ivanvovk Thanks for pointing that out; I was indeed using the checkpoint provided in this repo. Using the Universal HiFi-GAN checkpoint certainly improves the results!!

https://user-images.githubusercontent.com/81474354/173000437-0ded9b6a-4d3a-4398-8399-560e17b5334d.mov

ivanvovk commented 2 years ago

@aniketp02 glad to hear it helped! For even better quality you can finetune HiFi-GAN on outputs of Grad-TTS.

huawei-noah / Speech-Backbones

Generated outputs sound robotic in some cases! #14