voice cloning instability

mokel11 commented 1 year ago

I've been trying to synthesize files in a specific voice, but have been seeing a lot of variation in the output. The voice generated occasionally has the wrong pitch rhythm. Sometimes, it's just random noise. My reference audio clips are organized in lists of about 50 elements, each is a 10second torch tensor with a sampling frequency of 22050hz, as is recommended in the voice customization guide. Do you have any tips on how I can process the reference audio further to get better results that are more often than not, true to the reference voice?

Thank you!

yuyu1124 commented 8 months ago

i appeared other problem that i input a woman voice but outputting a man voice,and it can only output thirteen seconds audio file.

yuyu1124 commented 8 months ago

i have resolved my problem,replacing read.py with read_fast.py,you will get more 10x generated speed as you want to run this file.

neonbjb / tortoise-tts

voice cloning instability #654