thorstenMueller / Thorsten-Voice

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
http://www.thorsten-voice.de
Creative Commons Zero v1.0 Universal
545 stars 50 forks source link

Rhasspy German voice #10

Closed synesthesiam closed 3 years ago

synesthesiam commented 4 years ago

Hi Thorsten, thank you for your contribution!

I'm using your dataset to train a model for Rhasspy, an open source offline voice assistant (community site). I'm using a fork of MozillaTTS called Larynx to train a GlowTTS model and a multiband melgan vocoder.

It's not done yet, but here are some samples (without vocoder): https://drive.google.com/drive/folders/1IImZKg5CES02CxKK4vk8iy9gkIyvHmMk?usp=sharing

My TTS models use a restricted set of phonemes to keep their size down, which unfortunately makes them incompatible with MozillaTTS. I created a tool called gruut to do phonemization in a different way than phonemizer (using a lexicon a pre-trained grapheme-to-phoneme model).

To get an idea of what a "finished" voice is like, see the Dutch voice I trained from rdh's dataset (also a user on the MozillaTTS Discourse site). I also released that voice as an add-on for Home Assistant :relaxed:

I'll post here again when the model and Docker images are ready. Thanks again!

thorstenMueller commented 4 years ago

Hey Michael (@synesthesiam). Thanks for your nice words and for your effort on training a GlowTTS + mb melgan model for Rhasspy and Home Assistant.

Your preview audio files sound quite good, but it seems you're having some trouble with german umlauts as in "Können sie bitte langsamer sprechen". It's pronounced as "Konnen sie bitte ...".

The dutch version sounds really good even if i can't tell for sure since i don't speak dutch ;-).

I hope you can successfully complete training and i'd happy if you share your in between results here.

synesthesiam commented 4 years ago

Thanks for the feedback! I only speak English, so it really helps to know where there are incorrect pronunciations. The "können" case is interesting because the dictionary has the correct phones /k œ n n ɛ n s/ but the model is pronouncing it with a longer "o" sound instead of œ. So it seems more training is required (always the answer, right?)

I've updated the samples with a vocoder (not finished training).

synesthesiam commented 3 years ago

Not perfect, but I've released a "version 1" here: https://github.com/rhasspy/de_larynx-thorsten/

A Docker image is available for Raspberry Pi 2-4 and PC, as well as Hass.io add-ons for Home Assistant.

thorstenMueller commented 3 years ago

That's great - congratulations on release of version 1.

It's sounding quite good, even if the "umlaut" problem in "Können" is still there and the word "vegetarisches" sounds little bit weired, but nevertheless it's understandable.

Thanks for sharing your progress with us.

thorstenMueller commented 3 years ago

@synesthesiam i'd suggest to keep further discussion in Mozilla discourse (https://discourse.mozilla.org/t/contributing-my-german-voice-for-tts/48150/200?u=mrthorstenm)