gooofy / zamia-speech

Open tools and data for cloudless automatic speech recognition
GNU Lesser General Public License v3.0
443 stars 84 forks source link

Runtime error on Demo program #43

Closed himanshk96 closed 5 years ago

himanshk96 commented 5 years ago

I have been following the Github Quickstart link which converts 4 demo wavs files to text. It works fine, but now when I use my own Wav file it throughs an error as below:

Traceback (most recent call last): File "kaldi_decode_wav.py", line 72, in <module> if decoder.decode_wav_file(wavfile): File "kaldiasr/nnet3.pyx", line 207, in kaldiasr.nnet3.KaldiNNet3OnlineDecoder.decode_wav_file (kaldiasr/nnet3.cpp:4726) File "kaldiasr/nnet3.pyx", line 170, in kaldiasr.nnet3.KaldiNNet3OnlineDecoder.decode (kaldiasr/nnet3.cpp:3968) RuntimeError`

The file I am using is a vimeo video converted to wav using youtube-dl. get the wav file using this command

youtube-dl --extract-audio --audio-format wav https://vimeo.com/73643788

I give this file as input to the kaldi_decode_wav.py

Can anyone help me what thing I am doing wrong?

svenha commented 5 years ago

Is your input file 16 kHz mono? You can use soxi to display the audio format.

himanshk96 commented 5 years ago

I converted my file to 16000hz. Thank you so much, Works like a charm. I Didnt read anywhere that it requires 16khz bit rate wav file. Anyways! its solved

himanshk96 commented 5 years ago

I am working on academic domain. Do we have guides to add vocabulary to pretrained model instead of retraining it whole?

svenha commented 5 years ago

I don't know such guides, but this section might be helpful: http://kaldi-asr.org/doc/online_decoding.html#online_decoding_nnet2_vocab .

But you will probably need some files that led to the pretrained model and are not contained in the pretrained model. (?)

gooofy commented 5 years ago

you can follow the model adaptation section in our README which does allow for adaptation to a custom dict:

https://github.com/gooofy/zamia-speech#model-adaptation

svenha commented 5 years ago

Thanks for the pointer.

svenha commented 5 years ago

@gooofy What happens if a new word in the adaptation lexicon is not in the language model (lm.arpa) that is reused? I am asking because I cannot get new words to be recognized. (Should I put this discussion into a new issue?)

gooofy commented 5 years ago

I guess you will have to rebuild the language model first in that case - should be a fairly quick process. You can use either srilm or kenlm for that task.

svenha commented 5 years ago

I used kenlm and it worked. Thanks!

gooofy commented 5 years ago

cool - thanks for the feedback! :)

AndreiObert commented 5 years ago

Hello. Sorry to comment on closed ticket, but i have same problem only with microphone input. Do i have to somehow change its input frequency?

gooofy commented 5 years ago

the models expect 16 bit 16KHz mono audio input - if your recording setup produces anything other than that, you will have to either change your setup or use a converter.