snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Other
5k stars 316 forks source link

Not reading individual alphabets correctly. #284

Open RRThivyan opened 3 months ago

RRThivyan commented 3 months ago

Dear Team,

Model is not correctly reading individual letters, like, a, b, c etc. its pronouncing them too fast to hear. If I try to reduce the speed of the audio to hear the individual letters, it affects the overall audio quality.

Kindly assist me if there is a way to avoid this.

Thanks.

snakers4 commented 3 months ago

Hi,

If this is about English TTS, it is an inherent problem in the design. To combat this, TTS should be based on phonemes.