reproducing results on WSJ

samurain commented 7 years ago

I am not able to reproduce the results found in RESULTS for WSJ phoneme and character networks.

I used LDC93S6A and LDC94S13A instead of LDC93S6B and LDC94S13B, but as far as I can tell that shouldn't matter because I think A contains B as a subset.

Data prep and training completed without errors when I ran run_ctc_phn.sh and run_ctc_char.sh but I end up getting (for example):

%WER 13.80 [ 1136 / 8234, 111 ins, 154 del, 871 sub ] exp/train_char_l4_c320/decode_dev93_tgpr_larger/wer_5

compared to the expected result from the RESULTS file:

"%WER 12.12 [ 998 / 8234, 89 ins, 138 del, 771 sub ] exp/train_char_l4_c320/decode_dev93_tgpr_larger/wer_0.6"

and also:

%WER 12.67 [ 1043 / 8234, 159 ins, 105 del, 779 sub ] exp/train_phn_l4_c320/decode_dev93_tgpr/wer_9

compared to the expected:

"%WER 11.39 [ 938 / 8234, 119 ins, 127 del, 692 sub ] exp/train_phn_l4_c320/decode_dev93_tgpr/wer_0.8"

fmetze commented 7 years ago

Hm, haven't run this recipe in the original form in a long time. Do you have the phone accuracies during training?On Apr 18, 2017 9:55 AM, samurain notifications@github.com wrote:I am not able to reproduce the results found in RESULTS for WSJ phoneme and character networks. I used LDC93S6A and LDC94S13A instead of LDC93S6B and LDC94S13B, but as far as I can tell that shouldn't matter because I think A contains B as a subset. Data prep and training completed without errors when I ran run_ctc_phn.sh and run_ctc_char.sh but I end up getting (for example): %WER 13.80 [ 1136 / 8234, 111 ins, 154 del, 871 sub ] exp/train_char_l4_c320/decode_dev93_tgpr_larger/wer_5 compared to the expected result from the RESULTS file: "%WER 12.12 [ 998 / 8234, 89 ins, 138 del, 771 sub ] exp/train_char_l4_c320/decode_dev93_tgpr_larger/wer_0.6" and also: %WER 12.67 [ 1043 / 8234, 159 ins, 105 del, 779 sub ] exp/train_phn_l4_c320/decode_dev93_tgpr/wer_9 compared to the expected: "%WER 11.39 [ 938 / 8234, 119 ins, 127 del, 692 sub ] exp/train_phn_l4_c320/decode_dev93_tgpr/wer_0.8"

—You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or mute the thread.

samurain commented 7 years ago

Here is the output from training:

steps/train_ctc_parallel.sh --add-deltas true --num-sequence 10 --frame-num-limit 25000 --learn-rate 0.00004 --report-step 1000 data/train_tr95 data/train_cv05 exp/train_phn_l4_c320 Initializing model as exp/train_phn_l4_c320/nnet/nnet.iter0 TRAINING STARTS [2017-Apr-13 09:55:43] [NOTE] TOKEN_ACCURACY refers to token accuracy, i.e., (1.0 - token_error_rate). EPOCH 1 RUNNING ... ENDS [2017-Apr-13 14:29:21]: lrate 4e-05, TRAIN ACCURACY 57.8365%, VALID ACCURACY 71.2901% EPOCH 2 RUNNING ... ENDS [2017-Apr-13 19:08:09]: lrate 4e-05, TRAIN ACCURACY 79.9270%, VALID ACCURACY 79.3757% EPOCH 3 RUNNING ... ENDS [2017-Apr-14 00:01:07]: lrate 4e-05, TRAIN ACCURACY 83.3859%, VALID ACCURACY 77.1722% EPOCH 4 RUNNING ... ENDS [2017-Apr-14 04:35:47]: lrate 2e-05, TRAIN ACCURACY 84.9726%, VALID ACCURACY 84.2100% EPOCH 5 RUNNING ... ENDS [2017-Apr-14 09:30:23]: lrate 1e-05, TRAIN ACCURACY 88.0189%, VALID ACCURACY 86.0482% EPOCH 6 RUNNING ... ENDS [2017-Apr-14 14:11:17]: lrate 5e-06, TRAIN ACCURACY 89.1337%, VALID ACCURACY 86.8891% EPOCH 7 RUNNING ... ENDS [2017-Apr-14 19:04:58]: lrate 2.5e-06, TRAIN ACCURACY 89.7143%, VALID ACCURACY 87.4017% EPOCH 8 RUNNING ... ENDS [2017-Apr-14 23:41:33]: lrate 1.25e-06, TRAIN ACCURACY 89.9519%, VALID ACCURACY 87.6855% EPOCH 9 RUNNING ... ENDS [2017-Apr-15 04:40:04]: lrate 6.25e-07, TRAIN ACCURACY 90.1284%, VALID ACCURACY 87.8644% EPOCH 10 RUNNING ... ENDS [2017-Apr-15 09:33:05]: lrate 3.125e-07, TRAIN ACCURACY 90.2208%, VALID ACCURACY 88.0547% EPOCH 11 RUNNING ... ENDS [2017-Apr-15 14:28:04]: lrate 1.5625e-07, TRAIN ACCURACY 90.2622%, VALID ACCURACY 88.1603% EPOCH 12 RUNNING ... ENDS [2017-Apr-15 19:20:01]: lrate 7.8125e-08, TRAIN ACCURACY 90.2743%, VALID ACCURACY 88.2562% finished, too small rel. improvement .0959 Training succeeded. The final model exp/train_phn_l4_c320/final.nnet

fmetze commented 7 years ago

What happens if you provide "--halving-after-epoch 10" to train_ctc_parallel.sh? I notice that the learning rate starts halving after 3 epochs already (for some reason the valid accuracy drops in iter 4). This means that the final learning rate is very small (i.e. no training) and the model is not full trained.

samurain commented 7 years ago

That helped. It trained 6 extra epochs and CV token accuracies are higher than before. The CV token accuracies are now: exp/train_phn_l4_c320_halving/log/cv.iter1.log:TOKEN_ACCURACY >> 71.2901% << exp/train_phn_l4_c320_halving/log/cv.iter2.log:TOKEN_ACCURACY >> 79.3757% << exp/train_phn_l4_c320_halving/log/cv.iter3.log:TOKEN_ACCURACY >> 77.1722% << exp/train_phn_l4_c320_halving/log/cv.iter4.log:TOKEN_ACCURACY >> 83.0641% << exp/train_phn_l4_c320_halving/log/cv.iter5.log:TOKEN_ACCURACY >> 84.7452% << exp/train_phn_l4_c320_halving/log/cv.iter6.log:TOKEN_ACCURACY >> 84.5603% << exp/train_phn_l4_c320_halving/log/cv.iter7.log:TOKEN_ACCURACY >> 86.9495% << exp/train_phn_l4_c320_halving/log/cv.iter8.log:TOKEN_ACCURACY >> 87.3292% << exp/train_phn_l4_c320_halving/log/cv.iter9.log:TOKEN_ACCURACY >> 87.715% << exp/train_phn_l4_c320_halving/log/cv.iter10.log:TOKEN_ACCURACY >> 87.9588% << exp/train_phn_l4_c320_halving/log/cv.iter11.log:TOKEN_ACCURACY >> 87.8712% << exp/train_phn_l4_c320_halving/log/cv.iter12.log:TOKEN_ACCURACY >> 90.0815% << exp/train_phn_l4_c320_halving/log/cv.iter13.log:TOKEN_ACCURACY >> 90.8885% << exp/train_phn_l4_c320_halving/log/cv.iter14.log:TOKEN_ACCURACY >> 91.3248% << exp/train_phn_l4_c320_halving/log/cv.iter15.log:TOKEN_ACCURACY >> 91.5973% << exp/train_phn_l4_c320_halving/log/cv.iter16.log:TOKEN_ACCURACY >> 91.7257% << exp/train_phn_l4_c320_halving/log/cv.iter17.log:TOKEN_ACCURACY >> 91.8457% << exp/train_phn_l4_c320_halving/log/cv.iter18.log:TOKEN_ACCURACY >> 91.9061% <<

Decoding with the settings from the example run_ctc_phn.sh for dev93 gives: %WER 11.53 [ 949 / 8234, 145 ins, 102 del, 702 sub ] exp/train_phn_l4_c320_halving/decode_dev93_tgpr/wer_9

which is still a little worse than what is in the RESULTS file. However, decoding with a very wide beam and more max-active arcs gives: %WER 11.26 [ 927 / 8234, 138 ins, 98 del, 691 sub ] exp/train_phn_l4_c320_halving/decode_dev93_tgpr/wer_9

which is comparable to what is in the RESULTS file.

Thanks for you help!

srvk / eesen

reproducing results on WSJ #133