Open eqikkwkp25-cyber opened 1 year ago
Many thanks! I can schedule training ES on the German dataset but will need someone to validate it. I also have to find a good lexicon for that dataset. On a single GPU (I used RTX6000), the models train in a day. The models are a lot faster in ONNX. I still have to complete many details in the github (train and conversion to other model formats).
I'm pretty sure @thorstenMueller could help out with the Lexicon 🙂.
Alternatively I have a pretty big Lexicon file from Zamia Speech models (X-SAMPA-ish phonemes I think) that I used to train a Phonetisaurus G2P model as well (see adapt-lm repo), if that helps?
Many thanks! I can schedule training ES on the German dataset but will need someone to validate it. I also have to find a good lexicon for that dataset. On a single GPU (I used RTX6000), the models train in a day. The models are a lot faster in ONNX. I still have to complete many details in the github (train and conversion to other model formats).
For sure i can validate intermediate training results resp. generated models as a native speaking German if you mean this by the term validation.
Hi, how can i be supportive on that 🙂?
The dataset preparation has been documented in the current commit.
Training for the German language could be replicated using this procedure.
Many thanks Rowel for this repository and the provided English models. I like the quality and RTF on CPU, its about 10 on my PC, and i will soon install it on different kind of RPIs.
Do you have any plans to train a German model, for instance based on
ThorstenVoice Dataset 2022.10
Can you share your experience with regards to training speed?