srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
824 stars 342 forks source link

CMVN error on TIMIT Database #136

Open jasonTuZq opened 7 years ago

jasonTuZq commented 7 years ago

Hi, i ran into a problem when training a lstm-rnn acoustic model on TIMIT database. Here are parts of my code and the corresponding result after running.There is an error, e.g., LOG (apply-cmvn:main():apply-cmvn.cc:129) Applied cepstral mean and variance normalization to 400 utterances, errors on 0 . I can't find what's going wrong. Can anyone help me? Thanks!

-----------------------------------------parts of my code--------------------------- if [ $stage -le 2 ]; then echo ===================================================================== echo " FBank Feature Generation " echo ===================================================================== fbankdir=fbank

Generate the fbank features; by default 40-dimensional fbanks on each frame

steps/make_fbank.sh --cmd "$train_cmd" --nj 32 data/data/trn exp/make_fbank/trn $fbankdir || exit 1; utils/fix_data_dir.sh data/data/trn || exit; steps/compute_cmvn_stats.sh data/data/trn exp/make_fbank/train $fbankdir || exit 1;

steps/make_fbank.sh --cmd "$train_cmd" --nj 32 data/data/dev exp/make_fbank/dev $fbankdir || exit 1; utils/fix_data_dir.sh data/data/dev || exit; steps/compute_cmvn_stats.sh data/data/dev exp/make_fbank/train $fbankdir || exit 1;

steps/make_fbank.sh --cmd "$train_cmd" --nj 32 data/data/tst exp/make_fbank/tst $fbankdir || exit 1; utils/fix_data_dir.sh data/data/tst || exit; steps/compute_cmvn_stats.sh data/data/tst exp/make_fbank/train $fbankdir || exit 1;

fi

if [ $stage -le 3 ]; then echo ===================================================================== echo " Network Training " echo =====================================================================

Specify network structure and generate the network topology

input_feat_dim=120 # dimension of the input features; we will use 40-dimensional fbanks with deltas and double deltas lstm_layer_num=3 # number of LSTM layers lstm_cell_dim=240 # number of memory cells in every LSTM layer

dir=exp/train_phn_l${lstm_layer_num}_c${lstm_cell_dim} mkdir -p $dir

target_num=cat data/lang_phn/units.txt | wc -l; target_num=$[$target_num+1]; # #targets = #labels + 1 (the blank)

Output the network topology

utils/model_topo.py --input-feat-dim $input_feat_dim --lstm-layer-num $lstm_layer_num \ --lstm-cell-dim $lstm_cell_dim --target-num $target_num \ --fgate-bias-init 1.0 > $dir/nnet.proto || exit 1;

Label sequences; simply convert words into their label indices

utils/prep_ctc_trans.py data/lang_phn/lexicon_numbers.txt data/data/trn/trans "" | gzip -c - > $dir/labels.tr.gz utils/prep_ctc_trans.py data/lang_phn/lexicon_numbers.txt data/data/dev/trans "" | gzip -c - > $dir/labels.cv.gz

Train the network with CTC. Refer to the script for details about the arguments

steps/train_ctc_parallel.sh --add-deltas true --num-sequence 10 --frame-num-limit 25000 \ --learn-rate 0.00004 --report-step 1000 --halving-after-epoch 12 \ data/data/trn data/data/dev $dir || exit 1;

echo ===================================================================== echo " Decoding " echo =====================================================================

decoding

for lm_suffix in sw1_tg sw1_fsh_tgpr; do steps/decode_ctc_lat.sh --cmd "$decode_cmd" --nj 20 --beam 17.0 --lattice_beam 8.0 --max-active 5000 --acwt 0.6 \ data/langphn${lm_suffix} data/data/tst $dir/decodeeval2000${lm_suffix} || exit 1; done fi

-------------------------------result----------------------------- steps/train_ctc_parallel.sh --add-deltas true --num-sequence 10 --frame-num-limit 25000 --learn-rate 0.00004 --report-step 1000 --halving-after-epoch 12 data/data/trn data/data/dev exp/train_phn_l3_c240 feat-to-len scp:data/data/trn/feats.scp ark,t:- feat-to-len scp:data/data/dev/feats.scp ark,t:- copy-feats 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/data/trn/utt2spk scp:data/data/trn/cmvn.scp scp:exp/train_phn_l3_c240/train.scp ark:- |' ark,scp:/tmp/tmp.n5hv6njls3/train.ark,exp/train_phn_l3_c240/train_local.scp apply-cmvn --norm-vars=true --utt2spk=ark:data/data/trn/utt2spk scp:data/data/trn/cmvn.scp scp:exp/train_phn_l3_c240/train.scp ark:- LOG (apply-cmvn:main():apply-cmvn.cc:129) Applied cepstral mean and variance normalization to 3696 utterances, errors on 0 LOG (copy-feats:main():copy-feats.cc:100) Copied 3696 feature matrices. copy-feats 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/data/dev/utt2spk scp:data/data/dev/cmvn.scp scp:exp/train_phn_l3_c240/cv.scp ark:- |' ark,scp:/tmp/tmp.n5hv6njls3/cv.ark,exp/train_phn_l3_c240/cv_local.scp apply-cmvn --norm-vars=true --utt2spk=ark:data/data/dev/utt2spk scp:data/data/dev/cmvn.scp scp:exp/train_phn_l3_c240/cv.scp ark:- LOG (apply-cmvn:main():apply-cmvn.cc:129) Applied cepstral mean and variance normalization to 400 utterances, errors on 0 LOG (copy-feats:main():copy-feats.cc:100) Copied 400 feature matrices.

fmetze commented 7 years ago

Not sure - this all seems fine to me? Where is the error?

riebling commented 7 years ago

Yeah maybe it's just the misleading message "errors on 0" ... even just mentioning the word 'error' can be scary in a log message :)

jasonTuZq commented 7 years ago

But the training just exited before finishing and i didn't get any model. still not figuring out what's going wrong.... the following is the print: Initializing model as exp/model_l4_c320/nnet/nnet.iter0 TRAINING STARTS [2017-Jun-15 11:38:32] [NOTE] TOKEN_ACCURACY refers to token accuracy, i.e., (1.0 - token_error_rate). EPOCH 25 RUNNING ... Removing features tmpdir exp/model_l4_c320/Y7MuG @ 311Ubuntu cv.ark train.ark

fmetze commented 7 years ago

The logs show EPOCH 25 RUNNING - presumably you have models from epoch 0 (untrained) to 24 - not? Or are you for some reason starting the training at epoch 25, which means the system tries to load epoch 24, and fails?

bmilde commented 6 years ago

Take a look in exp/model_l4_c320/log/ and search for the training log of iteration 25. There might be some clues in the log why the training failed. You can also try to rerun the training from the last successful epoch by rerunning the training part of your script.