yh1008 / speech-to-text

mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras
http://llcao.net/cu-deeplearning17/project.html
70 stars 19 forks source link

mkgraph.sh: expected data/lang/G.fst to exist #10

Closed yh1008 closed 7 years ago

yh1008 commented 7 years ago

The above error occurs when calling utils/mkgraph.sh, like the following:

utils/mkgraph.sh data/lang exp/mono_10k exp/mono_10k/graph

or

utils/mkgraph.sh data/lang exp/tri1 exp/tri1/graph

The monophone system (exp/mono_10k) and triphone(exp/tri1) training are generated by the following script:

steps/train_mono.sh --boost-silence 1.25 --nj 8 --cmd run.pl \
data/train_10k data/lang exp/mono_10k

and

steps/train_deltas.sh  2000 11000 data/train data/lang exp/mono_ali exp/tri1
kailichen commented 7 years ago

Maybe we can try the following:

Preparing the grammar G ''' gunzip -c data_prep/lm.arpa.gz | \ arpa2fst --disambig-symbol=#0 \ --read-symbol-table=data/words.txt - data/G.fst '''

Refer to: Decoding-graph creation recipe (test time)

yh1008 commented 7 years ago

looks like we need to create ARPA language model first and then using arpa2fst to generate G.fst

   62 echo
   63 echo "===== LANGUAGE MODEL CREATION ====="
   64 echo "===== MAKING lm.arpa ====="
   65 echo
   66 
   67 loc=`which ngram-count`;
   68 if [ -z $loc ]; then
   69    if uname -a | grep 64 >/dev/null; then
   70            sdir=$KALDI_ROOT/tools/srilm/bin/i686-m64
   71    else
   72                    sdir=$KALDI_ROOT/tools/srilm/bin/i686
   73    fi
   74    if [ -f $sdir/ngram-count ]; then
   75                    echo "Using SRILM language modelling tool from $sdir"
   76                    export PATH=$PATH:$sdir
   77    else
   78                    echo "SRILM toolkit is probably not installed.
   79                            Instructions: tools/install_srilm.sh"
   80                    exit 1
   81    fi
   82 fi
   83 
   84 local=data/local
   85 mkdir $local/tmp
   86 ngram-count -order $lm_order -write-vocab $local/tmp/vocab-full.txt -wbdiscount -text $local/corpus.txt -lm $local/tmp/lm.arpa
   87 
   88 echo
   89 echo "===== MAKING G.fst ====="
   90 echo
   91 
   92 lang=data/lang
   93 arpa2fst --disambig-symbol=#0 --read-symbol-table=$lang/words.txt $local/tmp/lm.arpa $lang/G.fst

code sample from Kaldi for Dummies