grwgreg / silviux

voice to code on linux with kaldi
BSD 2-Clause "Simplified" License
59 stars 3 forks source link

Update to work with newer models #1

Open jvkim opened 1 year ago

jvkim commented 1 year ago

The kaldi site has two newer models available, the Librespeech and Gigaspeech models at https://www.kaldi-asr.org/models.html. I've been looking through the code at https://github.com/grwgreg/silviux/blob/main/server/silviux-server/lm-script/silviux.sh because that is where the model files are downloaded from the kaldi site but changing the url to the new models fails because files are missing or named differently. Are these new models at all compatible with this project? Thanks!

grwgreg commented 1 year ago

There are two parts to the question of whether the new models are compatible with silviux. The first is whether a built model will work with the gstreamer server and the second part is whether the language model tools and scripts can be used to build new models. Regarding the first part, if the yaml config has the needed properties and the version of kaldi is up to date, I think it should work. The server is a fork of https://github.com/alumae/kaldi-gstreamer-server so I'd try to get it working there first.

As for the scripts for building models, the ones currently in the server/lm-script folder were made to work with the aspire chain model's build process. I remember in 2020 looking at the librespeech model and I was put off because it was using a program g2p to make the lexicon files. Supporting that would have required changes to the dockerfile as well as rewriting all the scripts I had working. But there is no reason it wouldn't work if you wanted to install g2p and play around with the scripts. If you look at the run.sh file in https://github.com/kaldi-asr/kaldi/tree/master/egs/gigaspeech/s5 you'll see some of the "stages" run commands related to the lexicon and dictionary. Basically those need to be run with the new dictionary and language model files located in the right place to be used as input. Then the final stages (the ones that run utils/mkgraph.sh) have to be run again to make the final model dir that will be used from the server.

I'll take a look at the Gigaspeech model later when I have more time to see how much work it would be to get everything updated. I'm also a little worried the kaldi-gstreamer-server looks like it hasn't been updated since 2020 but kaldi is still very active so maybe it's just stable and works with the newer models.