kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.24k stars 5.32k forks source link

decode error: 3884 Segmentation fault in babel multi #3082

Closed willixen closed 5 years ago

willixen commented 5 years ago

Dear all and @danpovey , I want to decode a multitask model training by babel_multi recipe using my own multilingual data. but there occurs a "Segmentation fault" causing core dumped, and I can not find any error info else. I don't think the utterance is too long, cause it it just the same with the training data, and this testing data is work fine in a dnn model. Help me, please.

the log of decode recipe:

bash: line 1: 3895 Segmentation fault (core dumped) ( nnet3-latgen-faster --online-ivectors=scp:exp/chn863king/nnet3/ivectors _dev_hires/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --ext ra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=18.0 --lattice -beam=10.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/chn863king/tri5/graph/words.txt exp/nnet3/multi_bnf_sp /chn863king/final_adj.mdl exp/chn863king/tri5/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ar k:data/chn863king/dev_hires_pitch/split15/4/utt2spk scp:data/chn863king/dev_hirespitch/split15/4/cmvn.scp scp:data/chn863king/dev hires_pitch/split15/4/feats.scp ark:- |" "ark:|gzip -c exp/nnet3/multi_bnf_sp/chn863king/decode_dev/lat.4.gz" ) 2>> exp/nnet3/mult i_bnf_sp/chn863king/decode_dev/log/decode.4.log >> >exp/nnet3/multi_bnf_sp/chn863king/decode_dev/log/decode.4.log run.pl: 15 / 15 failed, log is in exp/nnet3/multi_bnf_sp/chn863king/decode_dev/log/decode.*.log

the log of decode_dev/log/decode.2.log

nnet3-latgen-faster --online-ivectors=scp:exp/chn863king/nnet3/ivectors_dev_hires/ivector_online.scp --online-ivector-period=10 - -frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=18.0 --lattice-beam=10.0 --acoustic-scale=0.1 --allow-partial=true --wor d-symbol-table=exp/chn863king/tri5/graph/words.txt exp/nnet3/multi_bnf_sp/chn863king/final_adj.mdl exp/chn863king/tri5/graph/HCLG.f st "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/chn863king/dev_hires_pitch/split15/2/utt2spk scp:da ta/chn863king/dev_hires_pitch/split15/2/cmvn.scp scp:data/chn863king/dev_hires_pitch/split15/2/feats.scp ark:- |" "ark:|gzip -c >ex p/nnet3/multi_bnf_sp/chn863king/decode_dev/lat.2.gz" 2 # Started at Sun Mar 10 09:53:13 CST 2019 3 # 4 nnet3-latgen-faster --online-ivectors=scp:exp/chn863king/nnet3/ivectors_dev_hires/ivector_online.scp --online-ivector-period=10 --f rames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 -- minimize=false --max-active=7000 --min-active=200 --beam=18.0 --lattice-beam=10.0 --acoustic-scale=0.1 --allow-partial=true --word- symbol-table=exp/chn863king/tri5/graph/words.txt exp/nnet3/multi_bnf_sp/chn863king/final_adj.mdl exp/chn863king/tri5/graph/HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/chn863king/dev_hires_pitch/split15/2/utt2spk scp:data /chn863king/dev_hires_pitch/split15/2/cmvn.scp scp:data/chn863king/dev_hires_pitch/split15/2/feats.scp ark:- |' 'ark:|gzip -c >exp/ nnet3/multi_bnf_sp/chn863king/decode_dev/lat.2.gz' 5 LOG (nnet3-latgen-faster[5.5.152~1-78f0]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 6 orphan nodes. 6 LOG (nnet3-latgen-faster[5.5.152~1-78f0]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 12 orphan components. 7 LOG (nnet3-latgen-faster[5.5.152~1-78f0]:Collapse():nnet-utils.cc:1378) Added 6 components, removed 12 8 apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/chn863king/dev_hires_pitch/split15/2/utt2spk scp:data/chn863king /dev_hires_pitch/split15/2/cmvn.scp scp:data/chn863king/dev_hires_pitch/split15/2/feats.scp ark:- 9 # Accounting: time=2 threads=1 10 # Ended (code 139) at Sun Mar 10 09:53:15 CST 2019, elapsed time 2 seconds

Any help will be appreciated! your best willix

danpovey commented 5 years ago

Likely a graph/model mismatch. By design, nnet decoders now segfault when you use the wrong graph for your model; this avoids a time-consuming extra check in the decoder inner loops.

willixen commented 5 years ago

Likely a graph/model mismatch. By design, nnet decoders now segfault when you use the wrong graph for your model; this avoids a time-consuming extra check in the decoder inner loops.

oh,got it. Thanks Dan. but here is a following question that I don't know why my model and graph are mismatched. can't I use graph generated by tri5 model in babel_multi dnn model decoding? the defauft recipe seems to decode like this way. the related code : utils/mkgraph.sh \ data/$lang/lang_test exp/$lang/tri5 exp/$lang/tri5/graph |tee exp/$lang/tri5/mkgraph.log I just change data/$lang/lang to data/$lang/lang_test to make sure there is a G.fst. @danpovey

danpovey commented 5 years ago

I don't recall the details of the babel recipe, but you should check what tree the system you are decoding was built with; if it was not the tri5 tree, you would have a problem.

On Sun, Mar 10, 2019 at 12:15 AM willixen notifications@github.com wrote:

Likely a graph/model mismatch. By design, nnet decoders now segfault when you use the wrong graph for your model; this avoids a time-consuming extra check in the decoder inner loops.

oh,got it. Thanks Dan. but here is a following question that I don't know why my model and graph are mismatched. can't I use graph generated by tri5 model in babel_multi dnn model decoding? the defauft recipe seems to decode like this way. the related code : utils/mkgraph.sh \ data/$lang/lang_test exp/$lang/tri5 exp/$lang/tri5/graph |tee exp/$lang/tri5/mkgraph.log I just change data/$lang/lang to data/$lang/lang_test to make sure there is a G.fst.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/3082#issuecomment-471248136, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu6yfKU3DsyEY21Sxcoqotmx9K4Luks5vVJTlgaJpZM4bnBNG .