snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Other
4.86k stars 303 forks source link

Adaptation TTS for the Belarusian language #212

Closed Kryuski closed 1 year ago

Kryuski commented 1 year ago

I'm trying to adapt the Ukrainian language TTS model v3_ua.pt to the Belarusian language. I had an issue with a pronunciation of the letter "Ў", which has no direct analogue in Ukrainian. When I pass "Ў" as "У", it sounds too long. I tried to shorten the duration of this letter using the SSML tag prosody: <speak>дагадзі<prosody rate="x-fast">у</prosody></speak> But in this case, the generator separates the letter from the rest of the word, inserting a pause as between adjacent words.

Can you recommend another solution to the issue, or is it not solvable without training a separate Belarusian-language TTS model?