gooofy / zamia-speech

Open tools and data for cloudless automatic speech recognition
GNU Lesser General Public License v3.0
443 stars 86 forks source link

Training for tdnn_sp #92

Closed dpny518 closed 4 years ago

dpny518 commented 4 years ago

It seems the training for this was removed from https://github.com/gooofy/zamia-speech/blob/master/data/src/speech/kaldi-run-chain.sh It only trains tdnn_250 and tdnn_f?

joazoa commented 4 years ago

Why do you say so ? It is a modified version of https://github.com/kaldi-asr/kaldi/blob/master/egs/tedlium/s5_r2/local/chain/tuning/run_tdnn_1g.sh

tdnn_250 was added to it. The larger models have not been trained but adapted afaik.

dpny518 commented 4 years ago

I mean if you look at the readme

kaldi-generic-en-tdnn_f Large nnet3-chain factorized TDNN model, trained on ~1200 hours of audio. Has decent background noise resistance and can also be used on phone recordings. Should provide the best accuracy but is a bit more resource intensive than the other models.
kaldi-generic-en-tdnn_sp Large nnet3-chain model, trained on ~1200 hours of audio. Has decent background noise resistance and can also be used on phone recordings. Less accurate but also slightly less resource intensive than the tddn_f model.
kaldi-generic-en-tdnn_250 Same as the larger models but less resource intensive, suitable for use in embedded applications (e.g. a RaspberryPi 3).

The pipeline to train this model "kaldi-generic-en-tdnn_sp" was removed,

https://github.com/gooofy/zamia-speech/commit/da3bc53e6c3d79e199047142cef4d3802160e903#diff-cb7e3644a72cdf9c931e6ade5476b438

it was the one with the dim-450, it was a great model that was perfect for asr servers as the tdnn_f is little too large. I think it would be good to build 250, 450, tdnn_f also a 250 no ivector version in the script, 250_noivector would be better for embeded as it is smaller in size and wer isn't effected as much, all you need to do is remove this lines

https://github.com/gooofy/zamia-speech/blob/master/data/src/speech/kaldi-run-chain.sh#L265
https://github.com/gooofy/zamia-speech/blob/master/data/src/speech/kaldi-run-chain.sh#L271
https://github.com/gooofy/zamia-speech/blob/master/data/src/speech/kaldi-run-chain.sh#L271
https://github.com/gooofy/zamia-speech/blob/master/data/src/speech/kaldi-run-chain.sh#L271
joazoa commented 4 years ago

Aah, i see what you mean now. you can easily add it back. This is the issue where gooofy explains why it was removed: https://github.com/gooofy/zamia-speech/issues/61

dpny518 commented 4 years ago

thanks