Open egorsmkv opened 1 year ago
Also, recently I have published two new voices - Mykyta (m) and Tetiana (f) - here https://github.com/egorsmkv/ukrainian-tts-datasets
They have the same format as Lada's dataset. It would be nice to see them in piper.
Thanks @egorsmkv! Do you think this is a problem with the espeak-ng Ukrainian voice?
One option is to train directly on the text, though this will not work as well for numbers and dates.
Let me know your thoughts; I'm happy to retrain as the community has new ideas.
Yes, the problem is with espeak-ng.
It’s a good idea to train directly from the text.
Let me know how can I help with listening to samples from training. We can communicate using any messenger.
@synesthesiam hi, any update on the issue?
@egorsmkv Thanks for checking back. No updates just yet; I'm preparing for our next update for Home Assistant's Year of Voice: https://www.youtube.com/watch?v=Tk-pnm7FY7c
After the event, my plan is to add the ability to train directly from text into Piper (by-passing espeak-ng). What do you think of this alphabet for Ukrainian?
!
'
,
-
.
:
;
?
А
Б
В
Г
Ґ
Д
Е
Є
Ж
З
И
І
Ї
Й
К
Л
М
Н
О
П
Р
С
Т
У
Ф
Х
Ц
Ч
Ш
Щ
Ь
Ю
Я
а
б
в
г
ґ
д
е
є
ж
з
и
і
ї
й
к
л
м
н
о
п
р
с
т
у
ф
х
ц
ч
ш
щ
ь
ю
я
@synesthesiam the alphabet is correct. One note: why there are uppercased and lowercased letters?
If they aren't needed, that will simplify the model. I don't know enough about Ukrainian to know if lower-casing can have consequences like in German :smile:
Whoa 😮
No, lower-cased words are spelled the same as upper-cased ones.
@egorsmkv Here are some samples from a multi-speaker model training on the Ukrainian-TTS datasets: https://drive.google.com/drive/folders/1xl8qJdOpPuimokXcwF8uV5lOrgdpPAJ9?usp=share_link
Are the pronunciations any better?
@synesthesiam yes, it definitely now better!
Samples sound correctly. Thanks a lot for this improvement!
Awesome, thanks! I'll get this voice uploaded once training is finished 🙂
@synesthesiam , @egorsmkv wow, thanks a lot! The above samples are really a huge improvement.
What is an adequate expectations one can have on being able to use the above voices?
Also, is there anything one can do to help improve the voice quality even better or add new voices? I've seen Piper docs linking to https://github.com/egorsmkv/ukrainian-tts-datasets/tree/main/lada, which in turn links to https://huggingface.co/spaces/theodotus/ukrainian-voices, which shows 5 different voices. Right now, only 1 is available by the above sample increases number to 3. I've no clue about nuances of TTS and what it takes to add a new voice, but I'm wondering if there's something I could do to help increase number of voice or their quality [available in Piper]?
Thanks!
@ashald You can look on my repository, I've published other voices there.
@egorsmkv I see you published models on HuggingFace (Mykyta, Olena, Harakternyk) which to me sound much better than what's shipped in Piper today. Do you think those models can be made available in Piper as well? Or if you're not interested in contributing them, can you please advise on how I can convert the PT file format into onnx and generate the metadata JSON required to use them with Piper? Thanks!
Hello. I think converting is not possible for Piper. You should to train these models from the ground up.
I found out that Lada does not spell digits for some reason. If I ask temperature she pronounce everything except of digits. May be it is because my temperature format is a floating numbers, I don't know. But I would like to know some thought on that.
Hello/Pryvit to all!
I am a native speaker of Ukrainian and the author of the initiative that brought us the Lada's voice.
I made some tests with piper and I have some thoughts to say. In short: it sounds incorrectly, seems like libespeak-ng mixes Russian and Ukrainian letters.
I'd like to start this issue and to have discussion over the issue.
We have a community in Telegram messenger - https://t.me/speech_synthesis_uk - where we're developing open source voices for synthesis, we can talk in a faster way there.
Supplemental materials:
Audio: https://user-images.githubusercontent.com/7875085/230715900-21535afa-4406-4002-a2cb-7181e16eb876.mp4
Text in Ukrainian: світе, привіт! я хочу протестувати цей голос
Translation: the world, hello! I want to test this voice