Closed synesthesiam closed 3 years ago
Hey Michael (@synesthesiam). Thanks for your nice words and for your effort on training a GlowTTS + mb melgan model for Rhasspy and Home Assistant.
Your preview audio files sound quite good, but it seems you're having some trouble with german umlauts as in "Können sie bitte langsamer sprechen". It's pronounced as "Konnen sie bitte ...".
The dutch version sounds really good even if i can't tell for sure since i don't speak dutch ;-).
I hope you can successfully complete training and i'd happy if you share your in between results here.
Thanks for the feedback! I only speak English, so it really helps to know where there are incorrect pronunciations. The "können" case is interesting because the dictionary has the correct phones /k œ n n ɛ n s/
but the model is pronouncing it with a longer "o" sound instead of œ. So it seems more training is required (always the answer, right?)
I've updated the samples with a vocoder (not finished training).
Not perfect, but I've released a "version 1" here: https://github.com/rhasspy/de_larynx-thorsten/
A Docker image is available for Raspberry Pi 2-4 and PC, as well as Hass.io add-ons for Home Assistant.
That's great - congratulations on release of version 1.
It's sounding quite good, even if the "umlaut" problem in "Können" is still there and the word "vegetarisches" sounds little bit weired, but nevertheless it's understandable.
Thanks for sharing your progress with us.
@synesthesiam i'd suggest to keep further discussion in Mozilla discourse (https://discourse.mozilla.org/t/contributing-my-german-voice-for-tts/48150/200?u=mrthorstenm)
Hi Thorsten, thank you for your contribution!
I'm using your dataset to train a model for Rhasspy, an open source offline voice assistant (community site). I'm using a fork of MozillaTTS called Larynx to train a GlowTTS model and a multiband melgan vocoder.
It's not done yet, but here are some samples (without vocoder): https://drive.google.com/drive/folders/1IImZKg5CES02CxKK4vk8iy9gkIyvHmMk?usp=sharing
My TTS models use a restricted set of phonemes to keep their size down, which unfortunately makes them incompatible with MozillaTTS. I created a tool called gruut to do phonemization in a different way than
phonemizer
(using a lexicon a pre-trained grapheme-to-phoneme model).To get an idea of what a "finished" voice is like, see the Dutch voice I trained from rdh's dataset (also a user on the MozillaTTS Discourse site). I also released that voice as an add-on for Home Assistant :relaxed:
I'll post here again when the model and Docker images are ready. Thanks again!