rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
6.73k stars 493 forks source link

Incorrect pronunciation of Ukrainian voice Lada #32

Open egorsmkv opened 1 year ago

egorsmkv commented 1 year ago

Hello/Pryvit to all!

I am a native speaker of Ukrainian and the author of the initiative that brought us the Lada's voice.

I made some tests with piper and I have some thoughts to say. In short: it sounds incorrectly, seems like libespeak-ng mixes Russian and Ukrainian letters.

I'd like to start this issue and to have discussion over the issue.

We have a community in Telegram messenger - https://t.me/speech_synthesis_uk - where we're developing open source voices for synthesis, we can talk in a faster way there.

Supplemental materials:

Audio: https://user-images.githubusercontent.com/7875085/230715900-21535afa-4406-4002-a2cb-7181e16eb876.mp4

Text in Ukrainian: світе, привіт! я хочу протестувати цей голос

Translation: the world, hello! I want to test this voice

egorsmkv commented 1 year ago

Also, recently I have published two new voices - Mykyta (m) and Tetiana (f) - here https://github.com/egorsmkv/ukrainian-tts-datasets

They have the same format as Lada's dataset. It would be nice to see them in piper.

synesthesiam commented 1 year ago

Thanks @egorsmkv! Do you think this is a problem with the espeak-ng Ukrainian voice?

One option is to train directly on the text, though this will not work as well for numbers and dates.

Let me know your thoughts; I'm happy to retrain as the community has new ideas.

egorsmkv commented 1 year ago

Yes, the problem is with espeak-ng.

It’s a good idea to train directly from the text.

Let me know how can I help with listening to samples from training. We can communicate using any messenger.

egorsmkv commented 1 year ago

@synesthesiam hi, any update on the issue?

synesthesiam commented 1 year ago

@egorsmkv Thanks for checking back. No updates just yet; I'm preparing for our next update for Home Assistant's Year of Voice: https://www.youtube.com/watch?v=Tk-pnm7FY7c

After the event, my plan is to add the ability to train directly from text into Piper (by-passing espeak-ng). What do you think of this alphabet for Ukrainian?

!
'
,
-
.
:
;
?
А
Б
В
Г
Ґ
Д
Е
Є
Ж
З
И
І
Ї
Й
К
Л
М
Н
О
П
Р
С
Т
У
Ф
Х
Ц
Ч
Ш
Щ
Ь
Ю
Я
а
б
в
г
ґ
д
е
є
ж
з
и
і
ї
й
к
л
м
н
о
п
р
с
т
у
ф
х
ц
ч
ш
щ
ь
ю
я
egorsmkv commented 1 year ago

@synesthesiam the alphabet is correct. One note: why there are uppercased and lowercased letters?

synesthesiam commented 1 year ago

If they aren't needed, that will simplify the model. I don't know enough about Ukrainian to know if lower-casing can have consequences like in German :smile:

egorsmkv commented 1 year ago

Whoa 😮

No, lower-cased words are spelled the same as upper-cased ones.

synesthesiam commented 1 year ago

@egorsmkv Here are some samples from a multi-speaker model training on the Ukrainian-TTS datasets: https://drive.google.com/drive/folders/1xl8qJdOpPuimokXcwF8uV5lOrgdpPAJ9?usp=share_link

Are the pronunciations any better?

egorsmkv commented 1 year ago

@synesthesiam yes, it definitely now better!

Samples sound correctly. Thanks a lot for this improvement!

synesthesiam commented 1 year ago

Awesome, thanks! I'll get this voice uploaded once training is finished 🙂

ashald commented 1 year ago

@synesthesiam , @egorsmkv wow, thanks a lot! The above samples are really a huge improvement.

What is an adequate expectations one can have on being able to use the above voices?

Also, is there anything one can do to help improve the voice quality even better or add new voices? I've seen Piper docs linking to https://github.com/egorsmkv/ukrainian-tts-datasets/tree/main/lada, which in turn links to https://huggingface.co/spaces/theodotus/ukrainian-voices, which shows 5 different voices. Right now, only 1 is available by the above sample increases number to 3. I've no clue about nuances of TTS and what it takes to add a new voice, but I'm wondering if there's something I could do to help increase number of voice or their quality [available in Piper]?

Thanks!

egorsmkv commented 1 year ago

@ashald You can look on my repository, I've published other voices there.

ashald commented 11 months ago

@egorsmkv I see you published models on HuggingFace (Mykyta, Olena, Harakternyk) which to me sound much better than what's shipped in Piper today. Do you think those models can be made available in Piper as well? Or if you're not interested in contributing them, can you please advise on how I can convert the PT file format into onnx and generate the metadata JSON required to use them with Piper? Thanks!

egorsmkv commented 11 months ago

Hello. I think converting is not possible for Piper. You should to train these models from the ground up.

boonya commented 10 months ago

I found out that Lada does not spell digits for some reason. If I ask temperature she pronounce everything except of digits. May be it is because my temperature format is a floating numbers, I don't know. But I would like to know some thought on that.