gooofy / py-kaldi-asr

Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as convenient as possible.
Apache License 2.0
170 stars 56 forks source link

how to run decoding on chain model built by myself #45

Open kelvinqin opened 3 years ago

kelvinqin commented 3 years ago

Guenter, Thanks for your work, I successfully compiled and tested your code. And also I can do decoding using your pre-trained model (kaldi-generic-en-tdnn_f-r20190609) now. Great work!

Meanwhile, I am learning Kaldi, and almost finished my first model building using one of Mandarin recipes (aishell). And I just want to know whether it is possible to run decoding task using your code with my own model?

My model is a chain model which was built following the standard recipe (kaldi/egs/aishell/s5/local/chain/run_tdnn.sh)

One possible difficulty I realized is how to collect all the needed data files into model directory, please advice if there is a instruction document on which files are needed and the corresponding source directory of kaldi. By reading your code, seems just collect all the files mentioned in the following code:

    ****cdef unicode mfcc_config           = u'%s/conf/mfcc_hires.conf'                  % self.modeldir
    cdef unicode word_symbol_table     = u'%s/%s/graph/words.txt'                    % (self.modeldir, self.model)
    cdef unicode model_in_filename     = u'%s/%s/final.mdl'                          % (self.modeldir, self.model)
    cdef unicode splice_conf_filename  = u'%s/ivectors_test_hires/conf/splice.conf'  % self.modeldir
    cdef unicode fst_in_str            = u'%s/%s/graph/HCLG.fst'                     % (self.modeldir, self.model)
    cdef unicode align_lex_filename    = u'%s/%s/graph/phones/align_lexicon.int'     % (self.modeldir, self.model)**

    **self.ie_conf_f.write((u"--cmvn-config=%s/conf/online_cmvn.conf\n" % self.modeldir).encode('utf8'))**
    self.ie_conf_f.write((u"--ivector-period=%d\n" % online_ivector_period).encode('utf8'))
    **self.ie_conf_f.write((u"--splice-config=%s\n" % splice_conf_filename).encode('utf8'))**
    **self.ie_conf_f.write((u"--lda-matrix=%s/extractor/final.mat\n" % self.modeldir).encode('utf8'))
    self.ie_conf_f.write((u"--global-cmvn-stats=%s/extractor/global_cmvn.stats\n" % self.modeldir).encode('utf8'))
    self.ie_conf_f.write((u"--diag-ubm=%s/extractor/final.dubm\n" % self.modeldir).encode('utf8'))**
    **self.ie_conf_f.write((u"--ivector-extractor=%s/extractor/final.ie\n" % self.modeldir).encode('utf8'))**
    self.ie_conf_f.write((u"--num-gselect=%d\n" % num_gselect).encode('utf8'))
    self.ie_conf_f.write((u"--min-post=%f\n" % min_post).encode('utf8'))
    self.ie_conf_f.write((u"--posterior-scale=%f\n" % posterior_scale).encode('utf8'))
    self.ie_conf_f.write((u"--max-remembered-frames=1000\n").encode('utf8'))
    self.ie_conf_f.write((u"--max-count=%d\n" % max_count).encode('utf8'))
    self.ie_conf_f.flush()**

Could you kindly elaborate a little about the source directory of those needed files?

Thanks! Kelvin

kelvinqin commented 3 years ago

Guenter, Have figure it out, :-) thanks! Kelvin

svenha commented 3 years ago

@kelvinqin As aishell is a popular recipe, would you mind sharing your solution? :-)