mravanelli / pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
2.36k stars 446 forks source link

Cannot reproduce LSTM result on TIMIT #216

Closed arvoelke closed 4 years ago

arvoelke commented 4 years ago

I followed the instructions in the README and was able to run an experiment with the configuration located at cfg/TIMIT_baselines/TIMIT_LSTM_fmllr_cudnn.cfg. However, instead of obtaining the reported 14.5% WER, the LSTM got 14.9% in one environment and 15.1% in another environment.

TIMIT_LSTM_fmllr_cudnn.cfg

``` [cfg_proto] cfg_proto = proto/global.proto cfg_proto_chunk = proto/global_chunk.proto [exp] cmd = run_nn_script = run_nn out_folder = exp/TIMIT_LSTM_fmllr_cudnn seed = 2234 use_cuda = True multi_gpu = False save_gpumem = False n_epochs_tr = 24 [dataset1] data_name = TIMIT_tr fea = fea_name=mfcc fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fbank fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/train/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fmllr fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/train/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 lab = lab_name=lab_cd lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali lab_opts=ali-to-pdf lab_count_file=auto lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph lab_name=lab_mono lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali lab_opts=ali-to-phones --per-frame=true lab_count_file=none lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph n_chunks = 5 [dataset2] data_name = TIMIT_dev fea = fea_name=mfcc fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fbank fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/dev/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fmllr fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/dev/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 lab = lab_name=lab_cd lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev lab_opts=ali-to-pdf lab_count_file=auto lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph lab_name=lab_mono lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev lab_opts=ali-to-phones --per-frame=true lab_count_file=none lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph n_chunks = 1 [dataset3] data_name = TIMIT_test fea = fea_name=mfcc fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fbank fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/test/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fmllr fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/test/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 lab = lab_name=lab_cd lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test lab_opts=ali-to-pdf lab_count_file=auto lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph lab_name=lab_mono lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test lab_opts=ali-to-phones --per-frame=true lab_count_file=none lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph n_chunks = 1 [data_use] train_with = TIMIT_tr valid_with = TIMIT_dev forward_with = TIMIT_test [batches] batch_size_train = 8 max_seq_length_train = 1000 increase_seq_length_train = True start_seq_len_train = 100 multply_factor_seq_len_train = 2 batch_size_valid = 8 max_seq_length_valid = 1000 [architecture1] arch_name = LSTM_cudnn_layers arch_proto = proto/LSTM_cudnn.proto arch_library = neural_networks arch_class = LSTM_cudnn arch_pretrain_file = none arch_freeze = False arch_seq_model = True hidden_size=550 num_layers=4 bias=True batch_first=True dropout=0.2 bidirectional=True arch_lr = 0.0016 arch_halving_factor = 0.5 arch_improvement_threshold = 0.001 arch_opt = rmsprop opt_momentum = 0.0 opt_alpha = 0.95 opt_eps = 1e-8 opt_centered = False opt_weight_decay = 0.0 [architecture2] arch_name = MLP_layers arch_proto = proto/MLP.proto arch_library = neural_networks arch_class = MLP arch_pretrain_file = none arch_freeze = False arch_seq_model = False dnn_lay = N_out_lab_cd dnn_drop = 0.0 dnn_use_laynorm_inp = False dnn_use_batchnorm_inp = False dnn_use_batchnorm = False dnn_use_laynorm = False dnn_act = softmax arch_lr = 0.0016 arch_halving_factor = 0.5 arch_improvement_threshold = 0.001 arch_opt = rmsprop opt_momentum = 0.0 opt_alpha = 0.95 opt_eps = 1e-8 opt_centered = False opt_weight_decay = 0.0 [architecture3] arch_name = MLP_layers2 arch_proto = proto/MLP.proto arch_library = neural_networks arch_class = MLP arch_pretrain_file = none arch_freeze = False arch_seq_model = False dnn_lay = N_out_lab_mono dnn_drop = 0.0 dnn_use_laynorm_inp = False dnn_use_batchnorm_inp = False dnn_use_batchnorm = False dnn_use_laynorm = False dnn_act = softmax arch_lr = 0.0004 arch_halving_factor = 0.5 arch_improvement_threshold = 0.001 arch_opt = rmsprop opt_momentum = 0.0 opt_alpha = 0.95 opt_eps = 1e-8 opt_centered = False opt_weight_decay = 0.0 [model] model_proto = proto/model.proto model = out_dnn1=compute(LSTM_cudnn_layers,fmllr) out_dnn2=compute(MLP_layers,out_dnn1) out_dnn3=compute(MLP_layers2,out_dnn1) loss_mono=cost_nll(out_dnn3,lab_mono) loss_mono_w=mult_constant(loss_mono,1.0) loss_cd=cost_nll(out_dnn2,lab_cd) loss_final=sum(loss_cd,loss_mono_w) err_final=cost_err(out_dnn2,lab_cd) [forward] forward_out = out_dnn2 normalize_posteriors = True normalize_with_counts_from = lab_cd save_out_file = False require_decoding = True [decoding] decoding_script_folder = kaldi_decoding_scripts/ decoding_script = decode_dnn.sh decoding_proto = proto/decoding.proto min_active = 200 max_active = 7000 max_mem = 50000000 beam = 13.0 latbeam = 8.0 acwt = 0.2 max_arcs = -1 skip_scoring = false scoring_script = local/score.sh scoring_opts = "--min-lmwt 1 --max-lmwt 10" norm_vars = False ```

(15.1%) res.res

``` ep=00 tr=['TIMIT_tr'] loss=5.633 err=0.731 valid=TIMIT_dev loss=2.810 err=0.506 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=355 ep=01 tr=['TIMIT_tr'] loss=2.268 err=0.425 valid=TIMIT_dev loss=2.105 err=0.399 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=367 ep=02 tr=['TIMIT_tr'] loss=1.661 err=0.329 valid=TIMIT_dev loss=1.956 err=0.374 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=349 ep=03 tr=['TIMIT_tr'] loss=1.375 err=0.282 valid=TIMIT_dev loss=1.937 err=0.368 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=355 ep=04 tr=['TIMIT_tr'] loss=1.155 err=0.244 valid=TIMIT_dev loss=1.946 err=0.365 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=354 ep=05 tr=['TIMIT_tr'] loss=0.983 err=0.214 valid=TIMIT_dev loss=1.942 err=0.353 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=282 ep=06 tr=['TIMIT_tr'] loss=0.838 err=0.187 valid=TIMIT_dev loss=1.992 err=0.356 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=260 ep=07 tr=['TIMIT_tr'] loss=0.567 err=0.128 valid=TIMIT_dev loss=1.975 err=0.335 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=259 ep=08 tr=['TIMIT_tr'] loss=0.440 err=0.100 valid=TIMIT_dev loss=2.053 err=0.335 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=259 ep=09 tr=['TIMIT_tr'] loss=0.315 err=0.068 valid=TIMIT_dev loss=2.105 err=0.330 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=261 ep=10 tr=['TIMIT_tr'] loss=0.256 err=0.054 valid=TIMIT_dev loss=2.176 err=0.329 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=257 ep=11 tr=['TIMIT_tr'] loss=0.217 err=0.045 valid=TIMIT_dev loss=2.252 err=0.331 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=260 ep=12 tr=['TIMIT_tr'] loss=0.171 err=0.032 valid=TIMIT_dev loss=2.299 err=0.327 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=258 ep=13 tr=['TIMIT_tr'] loss=0.149 err=0.026 valid=TIMIT_dev loss=2.353 err=0.329 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=259 ep=14 tr=['TIMIT_tr'] loss=0.129 err=0.020 valid=TIMIT_dev loss=2.388 err=0.327 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=257 ep=15 tr=['TIMIT_tr'] loss=0.119 err=0.018 valid=TIMIT_dev loss=2.414 err=0.327 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=260 ep=16 tr=['TIMIT_tr'] loss=0.110 err=0.015 valid=TIMIT_dev loss=2.435 err=0.327 lr_architecture1=5e-05 lr_architecture2=5e-05 lr_architecture3=1.25e-05 time(s)=260 ep=17 tr=['TIMIT_tr'] loss=0.105 err=0.014 valid=TIMIT_dev loss=2.446 err=0.327 lr_architecture1=2.5e-05 lr_architecture2=2.5e-05 lr_architecture3=6.25e-06 time(s)=258 ep=18 tr=['TIMIT_tr'] loss=0.103 err=0.013 valid=TIMIT_dev loss=2.451 err=0.327 lr_architecture1=1.25e-05 lr_architecture2=1.25e-05 lr_architecture3=3.125e-06 time(s)=260 ep=19 tr=['TIMIT_tr'] loss=0.101 err=0.013 valid=TIMIT_dev loss=2.453 err=0.327 lr_architecture1=6.25e-06 lr_architecture2=6.25e-06 lr_architecture3=1.5625e-06 time(s)=258 ep=20 tr=['TIMIT_tr'] loss=0.100 err=0.013 valid=TIMIT_dev loss=2.454 err=0.327 lr_architecture1=3.125e-06 lr_architecture2=3.125e-06 lr_architecture3=7.8125e-07 time(s)=260 ep=21 tr=['TIMIT_tr'] loss=0.100 err=0.013 valid=TIMIT_dev loss=2.456 err=0.327 lr_architecture1=1.5625e-06 lr_architecture2=1.5625e-06 lr_architecture3=3.90625e-07 time(s)=258 ep=22 tr=['TIMIT_tr'] loss=0.100 err=0.013 valid=TIMIT_dev loss=2.455 err=0.327 lr_architecture1=7.8125e-07 lr_architecture2=7.8125e-07 lr_architecture3=1.953125e-07 time(s)=210 ep=23 tr=['TIMIT_tr'] loss=0.099 err=0.012 valid=TIMIT_dev loss=2.456 err=0.327 lr_architecture1=3.90625e-07 lr_architecture2=3.90625e-07 lr_architecture3=9.765625e-08 time(s)=157 %WER 15.1 | 192 7215 | 87.2 9.8 3.0 2.4 15.1 98.4 | -1.925 | /home/arvoelke/git/pytorch-kaldi/exp/lstm0/decode_TIMIT_test_out_dnn2/score_6/ctm_39phn.filt.sys ```

(14.9%) res.res

``` ep=00 tr=['TIMIT_tr'] loss=6.826 err=0.801 valid=TIMIT_dev loss=3.100 err=0.544 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=346 ep=01 tr=['TIMIT_tr'] loss=2.439 err=0.449 valid=TIMIT_dev loss=2.164 err=0.408 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=339 ep=02 tr=['TIMIT_tr'] loss=1.752 err=0.344 valid=TIMIT_dev loss=1.971 err=0.376 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=322 ep=03 tr=['TIMIT_tr'] loss=1.452 err=0.296 valid=TIMIT_dev loss=1.957 err=0.371 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=330 ep=04 tr=['TIMIT_tr'] loss=1.224 err=0.257 valid=TIMIT_dev loss=1.927 err=0.363 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=331 ep=05 tr=['TIMIT_tr'] loss=1.045 err=0.227 valid=TIMIT_dev loss=1.931 err=0.355 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=293 ep=06 tr=['TIMIT_tr'] loss=0.890 err=0.198 valid=TIMIT_dev loss=1.966 err=0.351 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=237 ep=07 tr=['TIMIT_tr'] loss=0.763 err=0.174 valid=TIMIT_dev loss=2.061 err=0.352 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=241 ep=08 tr=['TIMIT_tr'] loss=0.513 err=0.118 valid=TIMIT_dev loss=2.057 err=0.338 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=240 ep=09 tr=['TIMIT_tr'] loss=0.396 err=0.092 valid=TIMIT_dev loss=2.139 err=0.337 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=241 ep=10 tr=['TIMIT_tr'] loss=0.326 err=0.076 valid=TIMIT_dev loss=2.222 err=0.334 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=241 ep=11 tr=['TIMIT_tr'] loss=0.274 err=0.064 valid=TIMIT_dev loss=2.341 err=0.335 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=242 ep=12 tr=['TIMIT_tr'] loss=0.196 err=0.043 valid=TIMIT_dev loss=2.389 err=0.329 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=242 ep=13 tr=['TIMIT_tr'] loss=0.155 err=0.032 valid=TIMIT_dev loss=2.461 err=0.329 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=240 ep=14 tr=['TIMIT_tr'] loss=0.122 err=0.022 valid=TIMIT_dev loss=2.525 err=0.326 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=241 ep=15 tr=['TIMIT_tr'] loss=0.105 err=0.018 valid=TIMIT_dev loss=2.577 err=0.327 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=240 ep=16 tr=['TIMIT_tr'] loss=0.090 err=0.014 valid=TIMIT_dev loss=2.608 err=0.326 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=242 ep=17 tr=['TIMIT_tr'] loss=0.083 err=0.012 valid=TIMIT_dev loss=2.645 err=0.325 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=242 ep=18 tr=['TIMIT_tr'] loss=0.078 err=0.011 valid=TIMIT_dev loss=2.666 err=0.326 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=239 ep=19 tr=['TIMIT_tr'] loss=0.072 err=0.009 valid=TIMIT_dev loss=2.688 err=0.326 lr_architecture1=5e-05 lr_architecture2=5e-05 lr_architecture3=1.25e-05 time(s)=242 ep=20 tr=['TIMIT_tr'] loss=0.069 err=0.008 valid=TIMIT_dev loss=2.696 err=0.326 lr_architecture1=2.5e-05 lr_architecture2=2.5e-05 lr_architecture3=6.25e-06 time(s)=239 ep=21 tr=['TIMIT_tr'] loss=0.067 err=0.008 valid=TIMIT_dev loss=2.707 err=0.326 lr_architecture1=1.25e-05 lr_architecture2=1.25e-05 lr_architecture3=3.125e-06 time(s)=242 ep=22 tr=['TIMIT_tr'] loss=0.066 err=0.007 valid=TIMIT_dev loss=2.707 err=0.326 lr_architecture1=6.25e-06 lr_architecture2=6.25e-06 lr_architecture3=1.5625e-06 time(s)=241 ep=23 tr=['TIMIT_tr'] loss=0.065 err=0.007 valid=TIMIT_dev loss=2.708 err=0.326 lr_architecture1=3.125e-06 lr_architecture2=3.125e-06 lr_architecture3=7.8125e-07 time(s)=240 %WER 14.9 | 192 7215 | 87.1 9.8 3.0 2.1 14.9 98.4 | -1.878 | /home/arvoelke/git/pytorch-kaldi/exp/lstm1/decode_TIMIT_test_out_dnn2/score_8/ctm_39phn.filt.sys ```

I reran each 2-3 times and obtained the same results each time (respectively).

Environment:

I can include the yml environment files for the two conda environments, but the biggest difference to jump out to me is:

TParcollet commented 4 years ago

Hi ! For some reasons, the CUDNN version obtains worst performances ... Could you try the non cudnn version ?

Thanks.

arvoelke commented 4 years ago

Thanks for the help. That did slightly better (14.8%) using the conda environment with the pytorch channel. To run the config file, all I modified were the [dataset*] sections.

TIMIT_LSTM_fmllr.cfg

``` [cfg_proto] cfg_proto = proto/global.proto cfg_proto_chunk = proto/global_chunk.proto [exp] cmd = run_nn_script = run_nn out_folder = exp/TIMIT_LSTM_fmllr seed = 2234 use_cuda = True multi_gpu = False save_gpumem = False n_epochs_tr = 24 [dataset1] data_name = TIMIT_tr fea = fea_name=mfcc fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fbank fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/train/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fmllr fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/train/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/train/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 lab = lab_name=lab_cd lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali lab_opts=ali-to-pdf lab_count_file=auto lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph lab_name=lab_mono lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali lab_opts=ali-to-phones --per-frame=true lab_count_file=none lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/train/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph n_chunks = 5 [dataset2] data_name = TIMIT_dev fea = fea_name=mfcc fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fbank fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/dev/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fmllr fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/dev/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_dev.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 lab = lab_name=lab_cd lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev lab_opts=ali-to-pdf lab_count_file=auto lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph lab_name=lab_mono lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_dev lab_opts=ali-to-phones --per-frame=true lab_count_file=none lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/dev/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph n_chunks = 1 [dataset3] data_name = TIMIT_test fea = fea_name=mfcc fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fbank fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fbank/test/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/fbank/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 fea_name=fmllr fea_lst=/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/test/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/arvoelke/git/kaldi/egs/timit/s5/data/test/utt2spk ark:/home/arvoelke/git/kaldi/egs/timit/s5/data-fmllr-tri3/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=0 ark:- ark:- | cw_left=0 cw_right=0 lab = lab_name=lab_cd lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test lab_opts=ali-to-pdf lab_count_file=auto lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph lab_name=lab_mono lab_folder=/home/arvoelke/git/kaldi/egs/timit/s5/exp/dnn4_pretrain-dbn_dnn_ali_test lab_opts=ali-to-phones --per-frame=true lab_count_file=none lab_data_folder=/home/arvoelke/git/kaldi/egs/timit/s5/data/test/ lab_graph=/home/arvoelke/git/kaldi/egs/timit/s5/exp/tri3/graph n_chunks = 1 [data_use] train_with = TIMIT_tr valid_with = TIMIT_dev forward_with = TIMIT_test [batches] batch_size_train = 8 max_seq_length_train = 1000 increase_seq_length_train = True start_seq_len_train = 100 multply_factor_seq_len_train = 2 batch_size_valid = 8 max_seq_length_valid = 1000 [architecture1] arch_name = LSTM_cudnn_layers arch_proto = proto/LSTM.proto arch_library = neural_networks arch_class = LSTM arch_pretrain_file = none arch_freeze = False arch_seq_model = True lstm_lay = 550,550,550,550 lstm_drop = 0.2,0.2,0.2,0.2 lstm_use_laynorm_inp = False lstm_use_batchnorm_inp = False lstm_use_laynorm = False,False,False,False lstm_use_batchnorm = True,True,True,True lstm_bidir = True lstm_act = tanh,tanh,tanh,tanh lstm_orthinit = True arch_lr = 0.0016 arch_halving_factor = 0.5 arch_improvement_threshold = 0.001 arch_opt = rmsprop opt_momentum = 0.0 opt_alpha = 0.95 opt_eps = 1e-8 opt_centered = False opt_weight_decay = 0.0 [architecture2] arch_name = MLP_layers arch_proto = proto/MLP.proto arch_library = neural_networks arch_class = MLP arch_pretrain_file = none arch_freeze = False arch_seq_model = False dnn_lay = N_out_lab_cd dnn_drop = 0.0 dnn_use_laynorm_inp = False dnn_use_batchnorm_inp = False dnn_use_batchnorm = False dnn_use_laynorm = False dnn_act = softmax arch_lr = 0.0016 arch_halving_factor = 0.5 arch_improvement_threshold = 0.001 arch_opt = rmsprop opt_momentum = 0.0 opt_alpha = 0.95 opt_eps = 1e-8 opt_centered = False opt_weight_decay = 0.0 [architecture3] arch_name = MLP_layers2 arch_proto = proto/MLP.proto arch_library = neural_networks arch_class = MLP arch_pretrain_file = none arch_freeze = False arch_seq_model = False dnn_lay = N_out_lab_mono dnn_drop = 0.0 dnn_use_laynorm_inp = False dnn_use_batchnorm_inp = False dnn_use_batchnorm = False dnn_use_laynorm = False dnn_act = softmax arch_lr = 0.0004 arch_halving_factor = 0.5 arch_improvement_threshold = 0.001 arch_opt = rmsprop opt_momentum = 0.0 opt_alpha = 0.95 opt_eps = 1e-8 opt_centered = False opt_weight_decay = 0.0 [model] model_proto = proto/model.proto model = out_dnn1=compute(LSTM_cudnn_layers,fmllr) out_dnn2=compute(MLP_layers,out_dnn1) out_dnn3=compute(MLP_layers2,out_dnn1) loss_mono=cost_nll(out_dnn3,lab_mono) loss_mono_w=mult_constant(loss_mono,1.0) loss_cd=cost_nll(out_dnn2,lab_cd) loss_final=sum(loss_cd,loss_mono_w) err_final=cost_err(out_dnn2,lab_cd) [forward] forward_out = out_dnn2 normalize_posteriors = True normalize_with_counts_from = lab_cd save_out_file = False require_decoding = True [decoding] decoding_script_folder = kaldi_decoding_scripts/ decoding_script = decode_dnn.sh decoding_proto = proto/decoding.proto min_active = 200 max_active = 7000 max_mem = 50000000 beam = 13.0 latbeam = 8.0 acwt = 0.2 max_arcs = -1 skip_scoring = false scoring_script = local/score.sh scoring_opts = "--min-lmwt 1 --max-lmwt 10" norm_vars = False ```

(14.8%) res.res

``` ep=00 tr=['TIMIT_tr'] loss=4.186 err=0.637 valid=TIMIT_dev loss=2.686 err=0.481 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1140 ep=01 tr=['TIMIT_tr'] loss=2.390 err=0.441 valid=TIMIT_dev loss=2.188 err=0.403 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1259 ep=02 tr=['TIMIT_tr'] loss=1.862 err=0.361 valid=TIMIT_dev loss=1.990 err=0.361 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1469 ep=03 tr=['TIMIT_tr'] loss=1.632 err=0.325 valid=TIMIT_dev loss=1.966 err=0.358 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1339 ep=04 tr=['TIMIT_tr'] loss=1.464 err=0.299 valid=TIMIT_dev loss=1.957 err=0.349 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1403 ep=05 tr=['TIMIT_tr'] loss=1.342 err=0.280 valid=TIMIT_dev loss=1.974 err=0.345 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1438 ep=06 tr=['TIMIT_tr'] loss=1.236 err=0.262 valid=TIMIT_dev loss=2.000 err=0.344 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1434 ep=07 tr=['TIMIT_tr'] loss=1.151 err=0.247 valid=TIMIT_dev loss=1.999 err=0.339 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1459 ep=08 tr=['TIMIT_tr'] loss=1.066 err=0.232 valid=TIMIT_dev loss=2.021 err=0.338 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1462 ep=09 tr=['TIMIT_tr'] loss=1.005 err=0.222 valid=TIMIT_dev loss=2.072 err=0.340 lr_architecture1=0.0016 lr_architecture2=0.0016 lr_architecture3=0.0004 time(s)=1531 ep=10 tr=['TIMIT_tr'] loss=0.818 err=0.185 valid=TIMIT_dev loss=2.042 err=0.326 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=1551 ep=11 tr=['TIMIT_tr'] loss=0.744 err=0.171 valid=TIMIT_dev loss=2.077 err=0.325 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=1489 ep=12 tr=['TIMIT_tr'] loss=0.693 err=0.161 valid=TIMIT_dev loss=2.154 err=0.327 lr_architecture1=0.0008 lr_architecture2=0.0008 lr_architecture3=0.0002 time(s)=1542 ep=13 tr=['TIMIT_tr'] loss=0.613 err=0.144 valid=TIMIT_dev loss=2.145 err=0.319 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=1558 ep=14 tr=['TIMIT_tr'] loss=0.580 err=0.138 valid=TIMIT_dev loss=2.180 err=0.319 lr_architecture1=0.0004 lr_architecture2=0.0004 lr_architecture3=0.0001 time(s)=1434 ep=15 tr=['TIMIT_tr'] loss=0.538 err=0.129 valid=TIMIT_dev loss=2.208 err=0.316 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=1393 ep=16 tr=['TIMIT_tr'] loss=0.520 err=0.125 valid=TIMIT_dev loss=2.237 err=0.318 lr_architecture1=0.0002 lr_architecture2=0.0002 lr_architecture3=5e-05 time(s)=1541 ep=17 tr=['TIMIT_tr'] loss=0.500 err=0.120 valid=TIMIT_dev loss=2.241 err=0.316 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=1561 ep=18 tr=['TIMIT_tr'] loss=0.494 err=0.119 valid=TIMIT_dev loss=2.247 err=0.315 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=1490 ep=19 tr=['TIMIT_tr'] loss=0.485 err=0.117 valid=TIMIT_dev loss=2.254 err=0.315 lr_architecture1=0.0001 lr_architecture2=0.0001 lr_architecture3=2.5e-05 time(s)=1419 ep=20 tr=['TIMIT_tr'] loss=0.477 err=0.115 valid=TIMIT_dev loss=2.269 err=0.314 lr_architecture1=5e-05 lr_architecture2=5e-05 lr_architecture3=1.25e-05 time(s)=1490 ep=21 tr=['TIMIT_tr'] loss=0.471 err=0.114 valid=TIMIT_dev loss=2.277 err=0.314 lr_architecture1=5e-05 lr_architecture2=5e-05 lr_architecture3=1.25e-05 time(s)=1407 ep=22 tr=['TIMIT_tr'] loss=0.466 err=0.113 valid=TIMIT_dev loss=2.279 err=0.315 lr_architecture1=2.5e-05 lr_architecture2=2.5e-05 lr_architecture3=6.25e-06 time(s)=1518 ep=23 tr=['TIMIT_tr'] loss=0.462 err=0.112 valid=TIMIT_dev loss=2.282 err=0.314 lr_architecture1=1.25e-05 lr_architecture2=1.25e-05 lr_architecture3=3.125e-06 time(s)=1500 %WER 14.8 | 192 7215 | 87.4 9.8 2.8 2.2 14.8 99.5 | -2.030 | /home/arvoelke/git/pytorch-kaldi/exp/TIMIT_LSTM_fmllr/decode_TIMIT_test_out_dnn2/score_6/ctm_39phn.filt.sys ```

Could there be issues with using a different CUDA, or random variability due to differenet environments? Would it help to run with several different seeds and take the average test result?

TParcollet commented 4 years ago

Not that much, the variability is 0.2 on TIMIT. Could you please post your res.res file ? I'm suspecting an old bug ...

arvoelke commented 4 years ago

The res.res file is in my previous post (click the triangle/name to toggle).

TParcollet commented 4 years ago

I see sorry. I was suspecting a LR bug, but apparently there is no bug. Hum, you could try to do multiple runs to see if you can achieve our 14.5%. @mravanelli I'm asking, but I believe that the answer is no: Have we changed the original configuration files for the TIMIT recipes?

Meanwhile, I'll try on my side. Could you please try to replicate the FBANK or MFCC results to see if this is a general problem or only specific to fmllr ?

mravanelli commented 4 years ago

Hi, we didn't change nothing from our side. The main difference is that those results were obtained with an older version of pytorch (0.4) and an old cuda. I suspect they did some changes that alter a bit the performance.

On Mar 9, 2020 09:37, "Parcollet Titouan" notifications@github.com wrote:

I see sorry. I was suspecting a LR bug, but apparently there is no bug. Hum, you could try to do multiple runs to see if you can achieve our 14.5%. @mravanelli https://github.com/mravanelli I'm asking, but I believe that the answer is no: Have we changed the original configuration files for the TIMIT recipes?

Meanwhile, I'll try on my side. Could you please try to replicate the FBANK or MFCC results to see if this is a general problem or only specific to fmllr ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/216?email_source=notifications&email_token=AEA2ZVWBECUOU2FXOB7YYXDRGTWLBA5CNFSM4LD2YL32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOHE3HY#issuecomment-596528543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVRPCAUO27TFBSXL3WLRGTWLBANCNFSM4LD2YL3Q .

arvoelke commented 4 years ago

Thanks for getting back. I will look into trying older versions as well as the other features when I can. In doing that, should I be rerunning the kaldi-asr scripts to regenerate the datasets each time that I switch the pytorch/CUDA versions? Or do you know whether the difference would be isolated somewhere in the training (and so I could reuse the same datasets)?