Open apiote opened 1 year ago
Tomorrow I will train a Glados dataset, but what worries me is the license to publish it.
That's what I was afraid of. Would instructions to train a dataset on one's own be more in the clear? I have no idea about hardware requirements, though.
To make things easier, I use colab notebooks, since I don't have the hardware. To run it locally, you would need an NVIDIA GPU and the parameters (eg batch_size) can be run according to the capabilities of your GPU.
I don't have the hardware either. And I guess if detailed instructions were published, that could still get DMCA'd as did tools like yt-dl
@rmcpantoja any update on the Glados training?
@rmcpantoja any update on the Glados training?
Hi @dnhkng, I have two GLaDos models made, one in Spanish and the other in English through my colab notebooks, but unfortunately since they are datasets with a lot of corpus, they require more training and I do not have the resources to buy colab pro. They are the following: English and Spanish
@rmcpantoja The English link doesn't work. I was going to try a finetune on the original game voice data. I have 2x 4090s, so I should have enough compute.
I could rip the voices from https://theportalwiki.com/wiki/GLaDOS_voice_lines but is there a dataset with this already prepared? Happy to share the results!
Hi @dnhkng, It sounds strange, I am able to open the Drive folder with the English model without problems. Anyway, here I've a model exported to onnx
The model was trained using this dataset, but I was in charge of fixing many incorrect transcriptions.
@rmcpantoja Thanks for the export! I found the checkpoint file eventually though, sorry! Sounds pretty good, I see it trained on colab for 2.25 hours.
I scrapped the GLaDOS dataset (only using the Portal 2 voice and DLC), manually filtered out all the wav files that contained extras (Laughing, telephones, beeping, etc), and also fixed all the text. That gave me about 1 hour of high-quality data. I have currently fine-tuned for 15 hours on a 4090, and it sounds very good, and the loss is still decreasing. I will train for 24 hours, and see how the loss curves look.
EDIT: Here is a samples after 24H of finetuning. 'a' is the generated sample, 'b' is an unseen sample from the the game. https://drive.google.com/drive/folders/1WVpS2zlJ9JqXIYV8Fkjoy5Fjz-eWPaEh?usp=sharing
I think the generated sample is better! Kudos to Piper, this is amazing!
@dnhkng Hello is it possible to get the model ?
@dnhkng Hello is it possible to get the model ?
Yes, I will share it in the next few days. Doing a big refactor on the inference code.
Sign me up as well
@dnhkng Any update on the model ?
OK, the model is available here: https://github.com/dnhkng/GlaDOS
You can find the GlaDOS model in the models directory.
It includes my new code base to use the voice. Have a look in the Jupyter Notebook on how to use it.
If you instead want to use it with Piper, just take a medium size model, and copy the .onnx.json file, and rename it as glados.onnx.json, and it will run with Piper.
Thank you very much for your work @dnhkng 👍👍👍
For those of you who want to run GlaDOS onnx model on iOS, Android, Raspberry Pi, or use C++, C, Go, C#, Kotlin, Swift, Python, Java, or on Windows, Linux, macOS, etc, please have a look at https://github.com/k2-fsa/sherpa-onnx
We provide a colab to show you how to convert the GlaDOS model to sherpa-onnx https://colab.research.google.com/drive/1m3Zr8H1RJaoZu4Y7hpQlav5vhtw3A513?usp=sharing
The following is a sample command using the converted model with sherpa-onnx
# You can also use sherpa-onnx-offline-tts-play
sherpa-onnx-offline-tts \
--vits-model=./glados.onnx \
--vits-tokens=./tokens.txt \
--vits-data-dir=./espeak-ng-data \
--output-filename=./test-glados.wav \
"How are you doing? This is a text-to-speech application using next generation Kaldi."
https://github.com/rhasspy/piper/assets/5284924/9e75eca4-b73d-46a4-b0c4-e88df3a2ae4b
By the way, I just managed to build Android APKs for the pre-trained GLaDOS models mentioned in this issue, i.e, for the following two models:
You can find the APKs at https://k2-fsa.github.io/sherpa/onnx/tts/apk.html
For your convenience, the download address is given below:
If you are interested in how we build the APK, please read the following documentation https://k2-fsa.github.io/sherpa/onnx/android/index.html
You can also try the models in the following huggingface space in your browser
Is there a GLaDOS voice for pipers as it was for larynx (https://github.com/rhasspy/larynx/issues/56)? Or possibly an easy way to convert one to another? I added phonemes and missing entries in the json file, but still there are phonemes missing and errors about the model