neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
12.79k stars 1.77k forks source link

Inconsistent Voice Timbre in Synthesized Speech #781

Open yuan1615 opened 3 months ago

yuan1615 commented 3 months ago

I noticed that when synthesizing different sentences, the voice timbre sounds inconsistent, as if it comes from different people, even though it’s supposed to be the same voice. How can this issue be resolved?

JohnHerry commented 1 month ago

I guess that your training dataset maybe not very good at speech quality. the samples from the same speaker may contains many kind of styles, or , speechs from different speaker had been wrongly marked as from a single one. so when you synthesize different sentence with the "same speaker", it may not consistant in speech style or even in timbre. I had the same problem as you said because I did not get good training dataset also. all in all, the so called "Large Model", its quality depends also on the data distribution but not only the data large count.