kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
13.95k stars 5.29k forks source link

nnet-latgen-faster vs online2-wav-nnet2-latgen-faster #3614

Closed tingyang01 closed 4 years ago

tingyang01 commented 4 years ago
Hello, I've built French STT model by using wsj/s5/local/online/run_nnet2.sh.
I decoded test data with following script.
if [ $stage -le 10 ]; then
  for lm_suffix in tgpr bd_tgpr; do
    graph_dir=exp/tri4b/graph_${lm_suffix}
    for year in eval92 dev93; do
      steps/nnet2/decode.sh --nj 8 --cmd "$decode_cmd" \
          --online-ivector-dir exp/nnet2_online/ivectors_test_$year \
         $graph_dir data/test_${year}_hires $dir/decode_${lm_suffix}_${year} || exit 1;
    done
  done
fi

if [ $stage -le 11 ]; then
  for lm_suffix in tgpr bd_tgpr; do
    graph_dir=exp/tri4b/graph_${lm_suffix}
    for year in eval92 dev93; do
      steps/online/nnet2/decode.sh --cmd "$decode_cmd" --nj 8 \
        "$graph_dir" data/test_${year} ${dir}_online/decode_${lm_suffix}_${year} || exit 1;
    done
  done
fi

Here steps/nnet2/decode.sh uses finally following command with nnet-latgen-faster.

nnet-latgen-faster --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri4b/graph_SRILM/words.txt exp/nnet2_online/nnet_ms_a/final.mdl exp/tri4b/graph_SRILM/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test_hires/split8/1/utt2spk scp:data/test_hires/split8/1/cmvn.scp scp:data/test_hires/split8/1/feats.scp ark:- | paste-feats --length-tolerance=10 ark:- 'ark,s,cs:utils/filter_scp.pl data/test_hires/split8/1/utt2spk exp/nnet2_online/ivectors_test/ivector_online.scp | subsample-feats --n=-10 scp:- ark:- | copy-matrix --scale=1.0 ark:- ark:-|' ark:- |" 'ark:|gzip -c > exp/nnet2_online/nnet_ms_a/decode_SRILM/lat.1.gz'
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test_hires/split8/1/utt2spk scp:data/test_hires/split8/1/cmvn.scp scp:data/test_hires/split8/1/feats.scp ark:-
paste-feats --length-tolerance=10 ark:- 'ark,s,cs:utils/filter_scp.pl data/test_hires/split8/1/utt2spk exp/nnet2_online/ivectors_test/ivector_online.scp | subsample-feats --n=-10 scp:- ark:- | copy-matrix --scale=1.0 ark:- ark:-|' ark:-
subsample-feats --n=-10 scp:- ark:-
copy-matrix --scale=1.0 ark:- ark:-
13-1410-0001 comme le sol car vous de retour à son palais il fit apporter toutes ces pierres il ils henri le très craint escale cité écartait qui l' irlande de celle qu' il allait du présenta latin
LOG (nnet-latgen-faster[5.5.463~1-9f3d8]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:289) Log-like per frame for utterance 13-1410-0001 is 0.175817 over 1128 frames.
13-1410-0002 et les employait à sans qu' il parut qu' ils espèce au cou avançait errants jarre et prendre dortoir à plusieurs reprises et en un moment elle n' avait pas achever la moitié délivra âge
LOG (nnet-latgen-faster[5.5.463~1-9f3d8]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:289) Log-like per frame for utterance 13-1410-0002 is 0.227961 over 1318 frames.
13-1410-0003 ils implorer à toute sa dessus tint avec ce que le grand vizir et les prête à des siens et tout ce qu' elle peur faire avec toute cela lui au plus d' achever la moitié de la croît et
LOG (nnet-latgen-faster[5.5.463~1-9f3d8]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:289) Log-like per frame for utterance 13-1410-0003 is 0.191348 over 1150 frames.
13-1410-0004 alla d' ânes qui connut que la satan s' effacer inutilement de rendre la jalousie semblables aux autres et que jamais il n' en viendrait ascendant leur fit venait allez affaire à fa et leur dit il en seulement de sa c' est la travail mais même de défaire puces qu' ils avaient fait et de reporté au sultan toutes ces pierres nez avec laquelle avait empruntait du grand vizir
LOG (nnet-latgen-faster[5.5.463~1-9f3d8]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:289) Log-like per frame for utterance 13-1410-0004 is 0.218075 over 2375 frames.
13-1410-0007 il ne fasse les joie et arrêt lézard fesses la à aimer plus et six semaines à faire un fait étui ambassadeur il serait téméraire elle la serra latin sage dans la salle en et tira le allant qu' elle avait celui hélas frotta
LOG (nnet-latgen-faster[5.5.463~1-9f3d8]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:289) Log-like per frame for utterance 13-1410-0007 is 0.181275 over 1415 frames.

Here steps/online/nnet2/decode.sh uses finally following command with online2-wav-nnet2-latgen-faster.

online2-wav-nnet2-latgen-faster --online=true --do-endpointing=false --config=exp/nnet2_online/nnet_ms_a_online/conf/online_nnet2_decoding.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=0.1 --word-symbol-table=exp/tri4b/graph_SRILM/words.txt exp/nnet2_online/nnet_ms_a_online/final.mdl exp/tri4b/graph_SRILM/HCLG.fst ark:data/test_hires/split8/1/spk2utt 'ark,s,cs:extract-segments scp,p:data/test_hires/split8/1/wav.scp data/test_hires/split8/1/segments ark:- |' 'ark:|gzip -c > exp/nnet2_online/nnet_ms_a_online/decode_SRILM/lat.1.gz'
LOG (online2-wav-nnet2-latgen-faster[5.5.463~1-9f3d8]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (online2-wav-nnet2-latgen-faster[5.5.463~1-9f3d8]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
extract-segments scp,p:data/test_hires/split8/1/wav.scp data/test_hires/split8/1/segments ark:-
13-1410-0001 de
LOG (online2-wav-nnet2-latgen-faster[5.5.463~1-9f3d8]:main():online2-wav-nnet2-latgen-faster.cc:276) Decoded utterance 13-1410-0001
13-1410-0002 de
LOG (online2-wav-nnet2-latgen-faster[5.5.463~1-9f3d8]:main():online2-wav-nnet2-latgen-faster.cc:276) Decoded utterance 13-1410-0002
13-1410-0003 rit
LOG (online2-wav-nnet2-latgen-faster[5.5.463~1-9f3d8]:main():online2-wav-nnet2-latgen-faster.cc:276) Decoded utterance 13-1410-0003
13-1410-0004 quand
LOG (online2-wav-nnet2-latgen-faster[5.5.463~1-9f3d8]:main():online2-wav-nnet2-latgen-faster.cc:276) Decoded utterance 13-1410-0004
13-1410-0007 toute je
LOG (online2-wav-nnet2-latgen-faster[5.5.463~1-9f3d8]:main():online2-wav-nnet2-latgen-faster.cc:276) Decoded utterance 13-1410-0007

As you can see at above log, the result of online2-wav-nnet2-latgen-faster is very poor than the result of nnet-latgen-faster. So I've changed somparameters like --acoustic-scale=1.0, and tested, but getting poor result. I am wonderful if I can get correct answer for it. Please share your ideas for it. Thank you.

danpovey commented 4 years ago

That is a super-old recipe and it would be hard for me to remember the details to figure it out. You shouldn't be using that recipe; use local/chain/run_tdnn.sh. The results are way better.