neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
12.84k stars 1.78k forks source link

voice cloning instability #654

Open mokel11 opened 10 months ago

mokel11 commented 10 months ago

I've been trying to synthesize files in a specific voice, but have been seeing a lot of variation in the output. The voice generated occasionally has the wrong pitch rhythm. Sometimes, it's just random noise. My reference audio clips are organized in lists of about 50 elements, each is a 10second torch tensor with a sampling frequency of 22050hz, as is recommended in the voice customization guide. Do you have any tips on how I can process the reference audio further to get better results that are more often than not, true to the reference voice?

Thank you!

yuyu1124 commented 6 months ago

i appeared other problem that i input a woman voice but outputting a man voice,and it can only output thirteen seconds audio file.

yuyu1124 commented 6 months ago

i have resolved my problem,replacing read.py with read_fast.py,you will get more 10x generated speed as you want to run this file.