Closed vijayaditya closed 8 years ago
After trying Dan's suggestion in #768 and not finding a large improvement, I decided to make an nnet3 moel that is the same as the nnet2 model in local/online/run_nnet2_ms_perturbed.sh
to the best of my ability: same pnorm dimensions, as well as same number of initial and final jobs, learning rate, epochs and so on.
Also, I have not been working on this as much as I would like to. @vince62s, do not let me delay you if you would like to experiment yourself.
I saw your results and they are not as good as nnet2 from the RESULTS file. The thing is that I am working on release2 of Tedlium so, it take more time and not comparable. However I was able to improve a bit further on nnet2 with 2 adjustments : lexicon (in fact there is a slight misfit. I noticed prepare_data.sh was refering to TEDLIUM.150K.dic which is the TEDLIUM lexicon, versus prepare_lm.sh which build the lexicon from the cantab-TEDLIUM.dct , they are not equal so I decided to merge them), Second adjustment being using a less pruned order 3 model for decoding. When I get time I 'll test the new pruned LMs from pocolm. PS: I came to this because only rescoring with various oder 4 LM did not bring any improvements.
I'd be surprised if there were many words in TEDLIUM.150K.dic which were not in cantab-TEDLIUM.dct.
On Mon, Jun 6, 2016 at 3:39 AM, vince62s notifications@github.com wrote:
I saw your results and they are not as good as nnet2 from the RESULTS file. The thing is that I am working on release2 of Tedlium so, it take more time and not comparable. However I was able to improve a bit further on nnet2 with 2 adjustments : lexicon (in fact there is a slight misfit. I noticed prepare_data.sh was refering to TEDLIUM.150K.dic which is the TEDLIUM lexicon, versus prepare_lm.sh which build the lexicon from the cantab-TEDLIUM.dct , they are not equal so I decided to merge them), Second adjustment being using a less pruned order 3 model for decoding. When I get time I 'll test the new pruned LMs from pocolm.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-223888491, or mute the thread https://github.com/notifications/unsubscribe/ADJVu23jnUZypqsV_C6sbNY4JqdMRXtgks5qI869gaJpZM4IpJFm .
@vince62s regarding the dictionary, I looked at the script which uses it (join_suffix.py), and all it is doing is transforming tokenizations like "it 's" to "it's". I doubt what dictionary is used causes much difference, so long as both dictionaries have common contractions like "it's", "they're" and so on. Did you test this change independently of the language model change and still get good results?
But anyway, it is good to know you are not idle because of me.
No improvement at all from using a pnorm nnet3 model with the same parameters as the nnet2 model. I'm sure there are other differences between nnet2 and nnet3 like mixing up that are a hassle to deal with anyway.
nnet3 experiment:
$ for x in exp/nnet3/tdnn_new_splice__b/decode*; do [ -d $x ] && grep Sum $x/score_*/*.sys | utils/best_wer.sh ; done 2>/dev/null
%WER 17.0 | 507 17792 | 86.2 10.2 3.5 3.3 17.0 90.7 | -0.005 | exp/nnet3/tdnn_new_splice__b/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 15.2 | 507 17792 | 87.7 8.9 3.4 2.9 15.2 88.2 | -0.085 | exp/nnet3/tdnn_new_splice__b/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 15.2 | 1155 27512 | 87.3 9.6 3.1 2.4 15.2 85.5 | -0.003 | exp/nnet3/tdnn_new_splice__b/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 13.8 | 1155 27512 | 88.4 8.5 3.1 2.2 13.8 82.9 | -0.069 | exp/nnet3/tdnn_new_splice__b/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys
nnet2 baseline (note this also uses pnorm)
# multi-splice + i-vector + perturbed
%WER 14.0 | 507 17792 | 88.5 8.4 3.0 2.5 14.0 88.0 | -0.074 | exp/nnet2_online/nnet_ms_sp/decode_dev/score_12/ctm.filt.filt.sys
%WER 13.3 | 1155 27512 | 88.7 8.7 2.6 2.0 13.3 81.5 | -0.097 | exp/nnet2_online/nnet_ms_sp/decode_test/score_10/ctm.filt.filt.sys
%WER 13.2 | 1155 27512 | 88.6 8.5 2.8 1.9 13.2 81.6 | -0.102 | exp/nnet2_online/nnet_ms_sp_online/decode_test/score_11/ctm.filt.filt.sys
%WER 13.6 | 1155 27512 | 88.5 8.9 2.6 2.1 13.6 82.7 | -0.095 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt/score_10/ctm.filt.filt.sys
# multi-splice + i-vector + perturbed + rescore
%WER 11.9 | 1155 27512 | 89.9 7.5 2.6 1.8 11.9 77.9 | -0.177 | exp/nnet2_online/nnet_ms_sp/decode_test.rescore/score_10/ctm.filt.filt.sys
%WER 11.8 | 1155 27512 | 90.0 7.4 2.6 1.8 11.8 77.6 | -0.300 | exp/nnet2_online/nnet_ms_sp_online/decode_test.rescore/score_10/ctm.filt.filt.sys
%WER 11.8 | 1155 27512 | 89.9 7.4 2.7 1.8 11.8 79.0 | -0.233 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt_offline.rescore/score_11/ctm.filt.filt.sys
%WER 12.3 | 1155 27512 | 89.5 7.6 3.0 1.8 12.3 80.5 | -0.200 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt.rescore/score_12/ctm.filt.filt.sys
I try to not look at the test set but for some reason the nnet2 baseline does not list its dev set results. Anyway, you can see that my recent nnet3 model lags behind by 1-3% absolute.
I'm going to rerun the nnet2 baseline script now, to see its results on the dev set.
Also, to make clear, I am working only on cross entropy trained models. sequence training will just take too much time right now, and a better cross entropy model should yield a better sequence-trained model.
My working directory is /export/ws15-ffs-data/dgalvez/kaldi-git/egs/tedlium/s5
All the new experiments I will be running will match the glob exp/nnet3/tdnn_new_splice*
from that directory.
I already found your directory.
It looks like in all of these results, both GMM and DNN, we are getting
quite a lot of <unk>
decoded. I think that could be messing up the
results.
Probably this only started happening after we fixed that problem with the
pronunciation of <unk>
in the cantab LM. I'm not sure if you re-ran the
nnet2 training after that was fixed?
One way to fix this is the --remove-oov flag to mkgraph.sh -- you'd have to
re-make the graph and re-decode. That prevents <unk>
from ever being
decoded.
Dan
On Tue, Jun 7, 2016 at 12:09 AM, Daniel Galvez notifications@github.com wrote:
No improvement at all from using a pnorm nnet3 model with the same parameters as the nnet2 model. I'm sure there are other differences between nnet2 and nnet3 like mixing up that are a hassle to deal with anyway.
nnet3 experiment:
$ for x in exp/nnet3/tdnn_new_spliceb/decode_; do [ -d $x ] && grep Sum $x/score__/*.sys | utils/best_wer.sh ; done 2>/dev/null %WER 17.0 | 507 17792 | 86.2 10.2 3.5 3.3 17.0 90.7 | -0.005 | exp/nnet3/tdnn_new_spliceb/decode_dev/score_10_0.0/ctm.filt.filt.sys %WER 15.2 | 507 17792 | 87.7 8.9 3.4 2.9 15.2 88.2 | -0.085 | exp/nnet3/tdnn_new_spliceb/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys %WER 15.2 | 1155 27512 | 87.3 9.6 3.1 2.4 15.2 85.5 | -0.003 | exp/nnet3/tdnn_new_spliceb/decode_test/score_10_0.0/ctm.filt.filt.sys %WER 13.8 | 1155 27512 | 88.4 8.5 3.1 2.2 13.8 82.9 | -0.069 | exp/nnet3/tdnn_new_splice__b/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys
nnet2 baseline (note this also uses pnorm)
multi-splice + i-vector + perturbed
%WER 14.0 | 507 17792 | 88.5 8.4 3.0 2.5 14.0 88.0 | -0.074 | exp/nnet2_online/nnet_ms_sp/decode_dev/score_12/ctm.filt.filt.sys %WER 13.3 | 1155 27512 | 88.7 8.7 2.6 2.0 13.3 81.5 | -0.097 | exp/nnet2_online/nnet_ms_sp/decode_test/score_10/ctm.filt.filt.sys %WER 13.2 | 1155 27512 | 88.6 8.5 2.8 1.9 13.2 81.6 | -0.102 | exp/nnet2_online/nnet_ms_sp_online/decode_test/score_11/ctm.filt.filt.sys %WER 13.6 | 1155 27512 | 88.5 8.9 2.6 2.1 13.6 82.7 | -0.095 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt/score_10/ctm.filt.filt.sys
multi-splice + i-vector + perturbed + rescore
%WER 11.9 | 1155 27512 | 89.9 7.5 2.6 1.8 11.9 77.9 | -0.177 | exp/nnet2_online/nnet_ms_sp/decode_test.rescore/score_10/ctm.filt.filt.sys %WER 11.8 | 1155 27512 | 90.0 7.4 2.6 1.8 11.8 77.6 | -0.300 | exp/nnet2_online/nnet_ms_sp_online/decode_test.rescore/score_10/ctm.filt.filt.sys %WER 11.8 | 1155 27512 | 89.9 7.4 2.7 1.8 11.8 79.0 | -0.233 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt_offline.rescore/score_11/ctm.filt.filt.sys %WER 12.3 | 1155 27512 | 89.5 7.6 3.0 1.8 12.3 80.5 | -0.200 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt.rescore/score_12/ctm.filt.filt.sys
I try to not look at the test set but for some reason the nnet2 baseline does not list its dev set results. Anyway, you can see that my recent nnet3 model lags behind by 1-3% absolute.
I'm going to rerun the nnet2 baseline script now, to see its results on the dev set.
Also, to make clear, I am working only on cross entropy trained models. sequence training will just take too much time right now, and a better cross entropy model should yield a better sequence-trained model.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-224169551, or mute the thread https://github.com/notifications/unsubscribe/ADJVu1YZnM1QYdKUW1b1qComqWw_Yd8Dks5qJO7pgaJpZM4IpJFm .
I never did run the nnet2 training after we fixed the
Your suggestion certainly seems reasonable. I believe that cross entropy neural net models should not depend on graph creation (so that I can use the same cross entropy trained model with different graphs). Is this correct? It will speed up my investigations.
Yes it doesn't depend on graph creation. Actually I'm starting to doubt whether unk is the problem... it mostly seems to be inserted between words, and it's removed in scoring. But worth trying anyway. And maybe running the nnet2 baseline will show us something. Dan
On Tue, Jun 7, 2016 at 12:49 AM, Daniel Galvez notifications@github.com wrote:
I never did run the nnet2 training after we fixed the problem with the cantab LM.
Your suggestion certainly seems reasonable. I believe that cross entropy neural net models should not depend on graph creation (so that I can use the same cross entropy trained model with different graphs). Is this correct? It will speed up my investigations.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-224175330, or mute the thread https://github.com/notifications/unsubscribe/ADJVuydJqWz4TrRlSqZYHZ67N5l6Pskxks5qJPg_gaJpZM4IpJFm .
I notice that the default scoring script in that directory scores from LMWT = 10 to 20, and that may not be optimal for the nnet3 models-- it's getting the best WER at 10_0.0. You could change the range to be 8 to 17 in the local/score_sclite.sh, which should be enough for all model types. Probably will make little diffrence though.
Dan
On Tue, Jun 7, 2016 at 12:50 AM, Daniel Povey dpovey@gmail.com wrote:
Yes it doesn't depend on graph creation. Actually I'm starting to doubt whether unk is the problem... it mostly seems to be inserted between words, and it's removed in scoring. But worth trying anyway. And maybe running the nnet2 baseline will show us something. Dan
On Tue, Jun 7, 2016 at 12:49 AM, Daniel Galvez notifications@github.com wrote:
I never did run the nnet2 training after we fixed the problem with the cantab LM.
Your suggestion certainly seems reasonable. I believe that cross entropy neural net models should not depend on graph creation (so that I can use the same cross entropy trained model with different graphs). Is this correct? It will speed up my investigations.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-224175330, or mute the thread https://github.com/notifications/unsubscribe/ADJVuydJqWz4TrRlSqZYHZ67N5l6Pskxks5qJPg_gaJpZM4IpJFm .
RE Dict. @galv yes the impact of TEDLIUM.150k.dic is negligeable on the scripts. BUT there is about 38K words difference between the two dicts. Both are 150k words but not the same words. So in the doubts I merged the 2 dicts leading to about 200k words (with duplicated prons). Impact on test set was minimal from 10.5% to 10.4% for the best run. I did not store it but impact on dev set was a bit more. @danpovey I doubt about unk issue because my runs were after the fix too. Actually I have to try the LM scale change too in scoring I get the best results in 10_0.0 too.
@vince62s, did you run the nnet2 baseline? And did you get the same numbers from the RESULTS file, or similar? Do grep Overall exp/whatever/log/computeprob*.final.log I want to see the diagnostics. Dan
On Tue, Jun 7, 2016 at 3:18 AM, vince62s notifications@github.com wrote:
RE Dict. @galv https://github.com/galv yes the impact of TEDLIUM.150k.dic is negligeable on the scripts. BUT there is about 38K words difference between the two dicts. Both are 150k words but not the same words. So in the doubts I merged the 2 dicts leading to about 200k words (with duplicated prons). Impact on test set was minimal from 10.5% to 10.4% for the best run. I did not store it but impact on dev set was a bit more. @danpovey https://github.com/danpovey I doubt about unk issue because my runs were after the fix too.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-224199917, or mute the thread https://github.com/notifications/unsubscribe/ADJVu_TvTQTJgvuBfBUw_HmEoCCCo74Wks5qJRtLgaJpZM4IpJFm .
nnet-compute-prob exp/nnet2_online/nnet_ms_sp/final.mdl ark:exp/nnet2_online/nnet_ms_sp/egs/train_diagnostic.egs Started at Thu Jun 2 00:09:07 CEST 2016
nnet-compute-prob exp/nnet2_online/nnet_ms_sp/final.mdl ark:exp/nnet2_online/nnet_ms_sp/egs/train_diagnostic.egs LOG (nnet-compute-prob:main():nnet-compute-prob.cc:91) Saw 4000 examples, average probability is -0.933567 and accuracy is 0.72975 with total weight 4000 -0.933567 Accounting: time=27 threads=1 Ended (code 0) at Thu Jun 2 00:09:34 CEST 2016, elapsed time 27 seconds
nnet-compute-prob exp/nnet2_online/nnet_ms_sp/final.mdl ark:exp/nnet2_online/nnet_ms_sp/egs/valid_diagnostic.egs Started at Thu Jun 2 00:09:07 CEST 2016
nnet-compute-prob exp/nnet2_online/nnet_ms_sp/final.mdl ark:exp/nnet2_online/nnet_ms_sp/egs/valid_diagnostic.egs LOG (nnet-compute-prob:main():nnet-compute-prob.cc:91) Saw 4000 examples, average probability is -1.39614 and accuracy is 0.64025 with total weight 4000 -1.39614 Accounting: time=27 threads=1 Ended (code 0) at Thu Jun 2 00:09:34 CEST 2016, elapsed time 27 seconds
Hm. It looks like his objective functions are quite a bit better than our nnet3 ones. I think we may be using too few parameters. But it could also be that it's helpful that the p-norm setup has relatively few parameters in the final layer, which might be helpful in this setup for some reason. @galv, perhaps you could run an nnet3 setup in which you try to replicate the old nnet2 setup as exactly as possible, including p-norm nonlinearities? I think the script does still support that. Dan
On Tue, Jun 7, 2016 at 4:22 PM, vince62s notifications@github.com wrote:
nnet-compute-prob exp/nnet2_online/nnet_ms_sp/final.mdl ark:exp/nnet2_online/nnet_ms_sp/egs/train_diagnostic.egs Started at Thu Jun 2 00:09:07 CEST 2016
# nnet-compute-prob exp/nnet2_online/nnet_ms_sp/final.mdl ark:exp/nnet2_online/nnet_ms_sp/egs/train_diagnostic.egs LOG (nnet-compute-prob:main():nnet-compute-prob.cc:91) Saw 4000 examples, average probability is -0.933567 and accuracy is 0.72975 with total weight 4000 -0.933567 Accounting: time=27 threads=1 Ended (code 0) at Thu Jun 2 00:09:34 CEST 2016, elapsed time 27 seconds nnet-compute-prob exp/nnet2_online/nnet_ms_sp/final.mdl ark:exp/nnet2_online/nnet_ms_sp/egs/valid_diagnostic.egs Started at Thu Jun 2 00:09:07 CEST 2016
# nnet-compute-prob exp/nnet2_online/nnet_ms_sp/final.mdl ark:exp/nnet2_online/nnet_ms_sp/egs/valid_diagnostic.egs LOG (nnet-compute-prob:main():nnet-compute-prob.cc:91) Saw 4000 examples, average probability is -1.39614 and accuracy is 0.64025 with total weight 4000 -1.39614 Accounting: time=27 threads=1 Ended (code 0) at Thu Jun 2 00:09:34 CEST 2016, elapsed time 27 seconds
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-224402217, or mute the thread https://github.com/notifications/unsubscribe/ADJVuztNgIwHfEqTUZOIZqQeNmLQQi9aks5qJdMYgaJpZM4IpJFm .
@danpovey, I actually already ran an experiment trying to replicate the old nnet2 setup. I can't reference github messages as far as I know, ubt it's the one with the tables in it above. I did not get results competitive with it, which is why I reran the nnet2 recipe. The nnet2 recipe failed, though, because the disk ran out of space. Working on it.
this experiment trying to replicate the nnet2 recipe in nnet3 corresponds to /export/ws15-ffs-data/dgalvez/kaldi-git/egs/tedlium/s5/exp/nnet3/tdnn_new_splice__b/
It looks like what you actually ran is a 350-dimensional ReLU. In the nnet3 python script, it's not enough to specify the p-norm input and output dims, you also have to specify the nonlinearity type. @vijayaditya, I think this may not be the best design. Dan
On Wed, Jun 8, 2016 at 1:20 AM, Daniel Galvez notifications@github.com wrote:
this experiment trying to replicate the nnet2 recipe in nnet3 corresponds to /export/ws15-ffs-data/dgalvez/kaldi-git/egs/tedlium/s5/exp/nnet3/tdnn_new_splice__b/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-224490387, or mute the thread https://github.com/notifications/unsubscribe/ADJVu_GbSx-VNrQk8xFi0KllBerlJJJGks5qJlEGgaJpZM4IpJFm .
And something else that differs is as follows: the nnet2 config is
"layer0/-2:-1:0:1:2 layer1/-1:2 layer3/-3:3 layer4/-7:2"
and notice that there is no splicing for layer 2, so you'd have to insert a "0" in the nnet3 splicing indexes after -1,2.
What you have is:
"-2,-1,0,1,2 -1,2 -3,3 -7,2 0"
On Wed, Jun 8, 2016 at 1:26 AM, Daniel Povey dpovey@gmail.com wrote:
It looks like what you actually ran is a 350-dimensional ReLU. In the nnet3 python script, it's not enough to specify the p-norm input and output dims, you also have to specify the nonlinearity type. @vijayaditya, I think this may not be the best design. Dan
On Wed, Jun 8, 2016 at 1:20 AM, Daniel Galvez notifications@github.com wrote:
this experiment trying to replicate the nnet2 recipe in nnet3 corresponds to /export/ws15-ffs-data/dgalvez/kaldi-git/egs/tedlium/s5/exp/nnet3/tdnn_new_splice__b/
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-224490387, or mute the thread https://github.com/notifications/unsubscribe/ADJVu_GbSx-VNrQk8xFi0KllBerlJJJGks5qJlEGgaJpZM4IpJFm .
see #833
so should someone re-run the nnet3 recipe ?
Probably @galv will rerun something soon now that that issue is fixed.
On Thu, Jun 9, 2016 at 4:09 PM, vince62s notifications@github.com wrote:
so should someone re-run the nnet3 recipe ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-225011427, or mute the thread https://github.com/notifications/unsubscribe/ADJVu0PDI7fSkQY8ubNtpOJ7jZir914Pks5qKHMFgaJpZM4IpJFm .
.. BTW, it doesn't affect the checked-in recipe because it uses ReLUs not p-norm.
On Thu, Jun 9, 2016 at 4:10 PM, Daniel Povey dpovey@gmail.com wrote:
Probably @galv will rerun something soon now that that issue is fixed.
On Thu, Jun 9, 2016 at 4:09 PM, vince62s notifications@github.com wrote:
so should someone re-run the nnet3 recipe ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-225011427, or mute the thread https://github.com/notifications/unsubscribe/ADJVu0PDI7fSkQY8ubNtpOJ7jZir914Pks5qKHMFgaJpZM4IpJFm .
I'll get to it in a few more hours.
On Thu, Jun 9, 2016 at 1:10 PM, Daniel Povey notifications@github.com wrote:
.. BTW, it doesn't affect the checked-in recipe because it uses ReLUs not p-norm.
On Thu, Jun 9, 2016 at 4:10 PM, Daniel Povey dpovey@gmail.com wrote:
Probably @galv will rerun something soon now that that issue is fixed.
On Thu, Jun 9, 2016 at 4:09 PM, vince62s notifications@github.com wrote:
so should someone re-run the nnet3 recipe ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-225011427, or mute the thread < https://github.com/notifications/unsubscribe/ADJVu0PDI7fSkQY8ubNtpOJ7jZir914Pks5qKHMFgaJpZM4IpJFm
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-225011716, or mute the thread https://github.com/notifications/unsubscribe/AEi_UGK9Bh6nMEXovyh33KxxcwBDybKEks5qKHNCgaJpZM4IpJFm .
Daniel Galvez
Here are the full results for nnet2's baseline model. I'll create a pull request with RESULTS updated to include this at some point, as I don't think we ever ran the nnet2 models after the language model changed.
%WER 14.4 | 507 17792 | 88.3 8.4 3.3 2.8 14.4 88.0 | -0.081 | exp/nnet2_online/nnet_ms_sp/decode_dev/score_10_0.5/ctm.filt.filt.sys
%WER 13.0 | 507 17792 | 89.3 7.4 3.2 2.3 13.0 84.8 | -0.207 | exp/nnet2_online/nnet_ms_sp/decode_dev.rescore/score_10_0.5/ctm.filt.filt.sys
%WER 13.0 | 1155 27512 | 88.9 8.1 3.0 2.0 13.0 82.3 | -0.107 | exp/nnet2_online/nnet_ms_sp/decode_test/score_11_0.5/ctm.filt.filt.sys
%WER 11.8 | 1155 27512 | 90.3 7.3 2.3 2.1 11.8 79.6 | -0.200 | exp/nnet2_online/nnet_ms_sp/decode_test.rescore/score_10_0.0/ctm.filt.filt.sys
%WER 14.2 | 507 17792 | 88.2 8.2 3.6 2.4 14.2 88.0 | -0.158 | exp/nnet2_online/nnet_ms_sp_online/decode_dev/score_10_1.0/ctm.filt.filt.sys
%WER 12.9 | 507 17792 | 89.3 7.4 3.3 2.2 12.9 84.4 | -0.307 | exp/nnet2_online/nnet_ms_sp_online/decode_dev.rescore/score_11_0.5/ctm.filt.filt.sys
%WER 14.3 | 507 17792 | 88.1 8.4 3.5 2.4 14.3 88.0 | -0.148 | exp/nnet2_online/nnet_ms_sp_online/decode_dev_utt/score_10_1.0/ctm.filt.filt.sys
%WER 13.2 | 507 17792 | 89.4 7.7 2.9 2.6 13.2 85.4 | -0.263 | exp/nnet2_online/nnet_ms_sp_online/decode_dev_utt.rescore/score_11_0.0/ctm.filt.filt.sys
%WER 14.0 | 507 17792 | 88.3 8.3 3.3 2.3 14.0 88.4 | -0.167 | exp/nnet2_online/nnet_ms_sp_online/decode_dev_utt_offline/score_10_1.0/ctm.filt.filt.sys
%WER 12.9 | 507 17792 | 89.7 7.7 2.6 2.6 12.9 85.0 | -0.296 | exp/nnet2_online/nnet_ms_sp_online/decode_dev_utt_offline.rescore/score_10_0.0/ctm.filt.filt.sys
%WER 13.0 | 1155 27512 | 89.0 8.1 2.9 2.0 13.0 82.3 | -0.157 | exp/nnet2_online/nnet_ms_sp_online/decode_test/score_11_0.5/ctm.filt.filt.sys
%WER 11.7 | 1155 27512 | 90.4 7.2 2.4 2.1 11.7 79.2 | -0.321 | exp/nnet2_online/nnet_ms_sp_online/decode_test.rescore/score_10_0.0/ctm.filt.filt.sys
%WER 13.6 | 1155 27512 | 88.6 8.7 2.7 2.2 13.6 84.2 | -0.152 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt/score_10_0.5/ctm.filt.filt.sys
%WER 12.2 | 1155 27512 | 89.7 7.6 2.7 2.0 12.2 80.5 | -0.313 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt.rescore/score_10_0.5/ctm.filt.filt.sys
%WER 12.9 | 1155 27512 | 88.6 8.1 3.3 1.6 12.9 82.7 | -0.159 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt_offline/score_11_1.0/ctm.filt.filt.sys
%WER 11.7 | 1155 27512 | 90.3 7.4 2.3 2.1 11.7 79.4 | -0.327 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt_offline.rescore/score_10_0.0/ctm.filt.filt.sys
The biggest thing to note here is that test set results are better than the dev set results. Hopefully this is because the test set is easier than the dev set. I can look through my old nnet3 experiments to verify this, but unfortunately I am 99% positive that my dev and test set results were within 0.5% absolute difference of each other.
Here are the results on just the test set:
# multi-splice + i-vector + perturbed
%WER 13.0 | 1155 27512 | 88.9 8.1 3.0 2.0 13.0 82.3 | -0.107 | exp/nnet2_online/nnet_ms_sp/decode_test/score_11_0.5/ctm.filt.filt.sys
%WER 13.0 | 1155 27512 | 89.0 8.1 2.9 2.0 13.0 82.3 | -0.157 | exp/nnet2_online/nnet_ms_sp_online/decode_test/score_11_0.5/ctm.filt.filt.sys
%WER 13.6 | 1155 27512 | 88.6 8.7 2.7 2.2 13.6 84.2 | -0.152 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt/score_10_0.5/ctm.filt.filt.sys
%WER 12.9 | 1155 27512 | 88.6 8.1 3.3 1.6 12.9 82.7 | -0.159 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt_offline/score_11_1.0/ctm.filt.filt.sys
# multi-splice + i-vector + perturbed + rescore
%WER 11.8 | 1155 27512 | 90.3 7.3 2.3 2.1 11.8 79.6 | -0.200 | exp/nnet2_online/nnet_ms_sp/decode_test.rescore/score_10_0.0/ctm.filt.filt.sys
%WER 11.7 | 1155 27512 | 90.4 7.2 2.4 2.1 11.7 79.2 | -0.321 | exp/nnet2_online/nnet_ms_sp_online/decode_test.rescore/score_10_0.0/ctm.filt.filt.sys
%WER 12.2 | 1155 27512 | 89.7 7.6 2.7 2.0 12.2 80.5 | -0.313 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt.rescore/score_10_0.5/ctm.filt.filt.sys
%WER 11.7 | 1155 27512 | 90.3 7.4 2.3 2.1 11.7 79.4 | -0.327 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt_offline.rescore/score_10_0.0/ctm.filt.filt.sys
Compare to the previous nnet2 results:
# multi-splice + i-vector + perturbed
%WER 14.0 | 507 17792 | 88.5 8.4 3.0 2.5 14.0 88.0 | -0.074 | exp/nnet2_online/nnet_ms_sp/decode_dev/score_12/ctm.filt.filt.sys
%WER 13.3 | 1155 27512 | 88.7 8.7 2.6 2.0 13.3 81.5 | -0.097 | exp/nnet2_online/nnet_ms_sp/decode_test/score_10/ctm.filt.filt.sys
%WER 13.2 | 1155 27512 | 88.6 8.5 2.8 1.9 13.2 81.6 | -0.102 | exp/nnet2_online/nnet_ms_sp_online/decode_test/score_11/ctm.filt.filt.sys
%WER 13.6 | 1155 27512 | 88.5 8.9 2.6 2.1 13.6 82.7 | -0.095 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt/score_10/ctm.filt.filt.sys
# multi-splice + i-vector + perturbed + rescore
%WER 11.9 | 1155 27512 | 89.9 7.5 2.6 1.8 11.9 77.9 | -0.177 | exp/nnet2_online/nnet_ms_sp/decode_test.rescore/score_10/ctm.filt.filt.sys
%WER 11.8 | 1155 27512 | 90.0 7.4 2.6 1.8 11.8 77.6 | -0.300 | exp/nnet2_online/nnet_ms_sp_online/decode_test.rescore/score_10/ctm.filt.filt.sys
%WER 11.8 | 1155 27512 | 89.9 7.4 2.7 1.8 11.8 79.0 | -0.233 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt_offline.rescore/score_11/ctm.filt.filt.sys
%WER 12.3 | 1155 27512 | 89.5 7.6 3.0 1.8 12.3 80.5 | -0.200 | exp/nnet2_online/nnet_ms_sp_online/decode_test_utt.rescore/score_12/ctm.filt.filt.sys
I'll get an actual nnet3 pnorm experiment running tonight, ideally within the hour.
I am using this as my splice indices for the nnet3 setup:
--splice-indexes "-2,-1,0,1,2 -1,2 0 -3,3 -7,2 0"
as opposed to this in nnet2: layer0/-2:-1:0:1:2 layer1/-1:2 layer3/-3:3 layer4/-7:2
There is a 0 at the end for nnet3. I believe there was a reason for this, and I'm guessing the nnet2 model implicitly has this as well, but I want to make sure this final 0 is fine.
Yes that looks right. In nnet2 you explicitly state the number of layers, and layers with no splicing default to "0"
On Fri, Jun 10, 2016 at 1:15 AM, Daniel Galvez notifications@github.com wrote:
I am using this as my splice indices for the nnet3 setup:
--splice-indexes "-2,-1,0,1,2 -1,2 0 -3,3 -7,2 0"
as opposed to this in nnet2: layer0/-2:-1:0:1:2 layer1/-1:2 layer3/-3:3 layer4/-7:2
There is a 0 at the end for nnet3. I believe there was a reason for this, and I'm guessing the nnet2 model implicitly has this as well, but I want to make sure this final 0 is fine.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-225094949, or mute the thread https://github.com/notifications/unsubscribe/ADJVu5pfXoamOJjO8LDqTgkNAIQQWH8vks5qKPLggaJpZM4IpJFm .
hey @galv did you end up with better results for Nnet3 ?
I did not. Here are the nnet3 baseline results:
(local/nnet3/run_tdnn.sh)
# %WER 14.6 | 507 17792 | 87.9 8.7 3.4 2.5 14.6 88.6 | -0.111 | exp/nnet3/tdnn/decode_dev/score_10_0.5/ctm.filt.filt.sys
# %WER 13.2 | 507 17792 | 89.4 7.7 2.9 2.6 13.2 85.0 | -0.170 | exp/nnet3/tdnn/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
# %WER 13.5 | 1155 27512 | 88.7 8.5 2.7 2.3 13.5 83.6 | -0.110 | exp/nnet3/tdnn/decode_test/score_10_0.0/ctm.filt.filt.sys
# %WER 12.1 | 1155 27512 | 89.9 7.5 2.6 2.1 12.1 80.3 | -0.178 | exp/nnet3/tdnn/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys
While a p-norm network in nnet3 with as much as possible the same as the nnet2 setup gets these results:
%WER 16.4 | 507 17792 | 86.4 9.8 3.8 2.9 16.4 90.7 | -0.064 | exp/nnet3/tdnn_new_splice__b/decode_dev/score_9_0.5/ctm.filt.filt.sys
%WER 14.9 | 507 17792 | 87.9 8.6 3.5 2.7 14.9 87.8 | -0.103 | exp/nnet3/tdnn_new_splice__b/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 15.0 | 1155 27512 | 87.4 9.6 3.0 2.4 15.0 85.2 | 0.018 | exp/nnet3/tdnn_new_splice__b/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 13.5 | 1155 27512 | 88.7 8.5 2.9 2.2 13.5 82.3 | -0.093 | exp/nnet3/tdnn_new_splice__b/decode_test_rescore/score_9_0.0/ctm.filt.filt.sys
while the pnorm network from nnet2 had these results:
%WER 14.4 | 507 17792 | 88.3 8.4 3.3 2.8 14.4 88.0 | -0.081 | exp/nnet2_online/nnet_ms_sp/decode_dev/score_10_0.5/ctm.filt.filt.sys
%WER 13.0 | 507 17792 | 89.3 7.4 3.2 2.3 13.0 84.8 | -0.207 | exp/nnet2_online/nnet_ms_sp/decode_dev.rescore/score_10_0.5/ctm.filt.filt.sys
%WER 13.0 | 1155 27512 | 88.9 8.1 3.0 2.0 13.0 82.3 | -0.107 | exp/nnet2_online/nnet_ms_sp/decode_test/score_11_0.5/ctm.filt.filt.sys
%WER 11.8 | 1155 27512 | 90.3 7.3 2.3 2.1 11.8 79.6 | -0.200 | exp/nnet2_online/nnet_ms_sp/decode_test.rescore/score_10_0.0/ctm.filt.filt.sys
This is a bit of a mouthful. I'm summarizing the data now and will follow up in a moment.
Daniel-- aren't those nnet3 p-norm results results with the buggy make_configs.py where it was actually using ReLU and not p-norm, and a very tiny network? Remember Vijay fixed the script?
Dan
On Sun, Jun 19, 2016 at 7:59 PM, Daniel Galvez notifications@github.com wrote:
I did not. Here are the nnet3 baseline results:
(local/nnet3/run_tdnn.sh)
%WER 15.3 | 507 17792 | 87.4 9.0 3.6 2.7 15.3 90.1 | -0.081 | exp/nnet3/tdnn_sp/decode_dev/score_10_0.5/ctm.filt.filt.sys
%WER 13.9 | 507 17792 | 88.4 8.0 3.6 2.3 13.9 85.8 | -0.164 | exp/nnet3/tdnn_sp/decode_dev_rescore/score_10_0.5/ctm.filt.filt.sys
%WER 13.8 | 1155 27512 | 88.5 8.7 2.7 2.3 13.8 84.2 | -0.076 | exp/nnet3/tdnn_sp/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 12.5 | 1155 27512 | 89.6 7.7 2.6 2.1 12.5 81.5 | -0.133 | exp/nnet3/tdnn_sp/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys
While a p-norm network in nnet3 with as much as possible the same as the nnet2 setup gets these results:
%WER 16.4 | 507 17792 | 86.4 9.8 3.8 2.9 16.4 90.7 | -0.064 | exp/nnet3/tdnn_new_spliceb/decode_dev/score_9_0.5/ctm.filt.filt.sys %WER 14.9 | 507 17792 | 87.9 8.6 3.5 2.7 14.9 87.8 | -0.103 | exp/nnet3/tdnn_new_spliceb/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys %WER 15.0 | 1155 27512 | 87.4 9.6 3.0 2.4 15.0 85.2 | 0.018 | exp/nnet3/tdnn_new_spliceb/decode_test/score_10_0.0/ctm.filt.filt.sys %WER 13.5 | 1155 27512 | 88.7 8.5 2.9 2.2 13.5 82.3 | -0.093 | exp/nnet3/tdnn_new_spliceb/decode_test_rescore/score_9_0.0/ctm.filt.filt.sys
while the pnorm network from nnet2 had these results:
%WER 14.4 | 507 17792 | 88.3 8.4 3.3 2.8 14.4 88.0 | -0.081 | exp/nnet2_online/nnet_ms_sp/decode_dev/score_10_0.5/ctm.filt.filt.sys %WER 13.0 | 507 17792 | 89.3 7.4 3.2 2.3 13.0 84.8 | -0.207 | exp/nnet2_online/nnet_ms_sp/decode_dev.rescore/score_10_0.5/ctm.filt.filt.sys %WER 13.0 | 1155 27512 | 88.9 8.1 3.0 2.0 13.0 82.3 | -0.107 | exp/nnet2_online/nnet_ms_sp/decode_test/score_11_0.5/ctm.filt.filt.sys %WER 11.8 | 1155 27512 | 90.3 7.3 2.3 2.1 11.8 79.6 | -0.200 | exp/nnet2_online/nnet_ms_sp/decode_test.rescore/score_10_0.0/ctm.filt.filt.sys
This is a bit of a mouthful. I'm summarizing the data now and will follow up in a moment.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-227028294, or mute the thread https://github.com/notifications/unsubscribe/ADJVuyveRzRwvYjLUOzDSnY-tyWGyORjks5qNdfYgaJpZM4IpJFm .
I remember that Vijay fixed the script.
I looked into it. Even though the time stamps on my experimental directories are after Vijay's fix, they are using relu components. This is rather embarrassing. At the least, this was an experiment I wanted to do: Using the same splices as nnet2, while using relu, but I may want to do it with a higher dimension than 350. git reflog shows that I never switched the branch...
On Sun, Jun 19, 2016 at 8:02 PM, Daniel Povey notifications@github.com wrote:
Daniel-- aren't those nnet3 p-norm results results with the buggy make_configs.py where it was actually using ReLU and not p-norm, and a very tiny network? Remember Vijay fixed the script?
Dan
On Sun, Jun 19, 2016 at 7:59 PM, Daniel Galvez notifications@github.com wrote:
I did not. Here are the nnet3 baseline results:
(local/nnet3/run_tdnn.sh)
%WER 15.3 | 507 17792 | 87.4 9.0 3.6 2.7 15.3 90.1 | -0.081 |
exp/nnet3/tdnn_sp/decode_dev/score_10_0.5/ctm.filt.filt.sys
%WER 13.9 | 507 17792 | 88.4 8.0 3.6 2.3 13.9 85.8 | -0.164 |
exp/nnet3/tdnn_sp/decode_dev_rescore/score_10_0.5/ctm.filt.filt.sys
%WER 13.8 | 1155 27512 | 88.5 8.7 2.7 2.3 13.8 84.2 | -0.076 |
exp/nnet3/tdnn_sp/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 12.5 | 1155 27512 | 89.6 7.7 2.6 2.1 12.5 81.5 | -0.133 |
exp/nnet3/tdnn_sp/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys
While a p-norm network in nnet3 with as much as possible the same as the nnet2 setup gets these results:
%WER 16.4 | 507 17792 | 86.4 9.8 3.8 2.9 16.4 90.7 | -0.064 | exp/nnet3/tdnn_new_spliceb/decode_dev/score_9_0.5/ctm.filt.filt.sys %WER 14.9 | 507 17792 | 87.9 8.6 3.5 2.7 14.9 87.8 | -0.103 | exp/nnet3/tdnn_new_spliceb/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys %WER 15.0 | 1155 27512 | 87.4 9.6 3.0 2.4 15.0 85.2 | 0.018 | exp/nnet3/tdnn_new_spliceb/decode_test/score_10_0.0/ctm.filt.filt.sys %WER 13.5 | 1155 27512 | 88.7 8.5 2.9 2.2 13.5 82.3 | -0.093 | exp/nnet3/tdnn_new_spliceb/decode_test_rescore/score_9_0.0/ctm.filt.filt.sys
while the pnorm network from nnet2 had these results:
%WER 14.4 | 507 17792 | 88.3 8.4 3.3 2.8 14.4 88.0 | -0.081 | exp/nnet2_online/nnet_ms_sp/decode_dev/score_10_0.5/ctm.filt.filt.sys %WER 13.0 | 507 17792 | 89.3 7.4 3.2 2.3 13.0 84.8 | -0.207 | exp/nnet2_online/nnet_ms_sp/decode_dev.rescore/score_10_0.5/ctm.filt.filt.sys %WER 13.0 | 1155 27512 | 88.9 8.1 3.0 2.0 13.0 82.3 | -0.107 | exp/nnet2_online/nnet_ms_sp/decode_test/score_11_0.5/ctm.filt.filt.sys %WER 11.8 | 1155 27512 | 90.3 7.3 2.3 2.1 11.8 79.6 | -0.200 | exp/nnet2_online/nnet_ms_sp/decode_test.rescore/score_10_0.0/ctm.filt.filt.sys
This is a bit of a mouthful. I'm summarizing the data now and will follow up in a moment.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-227028294, or mute the thread < https://github.com/notifications/unsubscribe/ADJVuyveRzRwvYjLUOzDSnY-tyWGyORjks5qNdfYgaJpZM4IpJFm
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-227028459, or mute the thread https://github.com/notifications/unsubscribe/AEi_UAxUab_9Zm5L-1HFQPG7DcXEhVphks5qNdiygaJpZM4IpJFm .
Daniel Galvez
Overall, here are the results:
nnet2 p-norm baseline | nnet3, original baseline | nnet3 relu-dim 350, nnet2 splices | nnet3 relu-dim 425, nnet2 splices | nnet3, just like p-norm baseline (for real) | |
---|---|---|---|---|---|
Dev | 14.4% | 14.6% | 16.4% | TODO | TODO |
Dev, rescored | 13.0% | 13.2% | 14.9% | TODO | TODO |
Test | 13.0% | 13.5% | 15.0% | TODO | TODO |
Test, rescored | 11.8% | 12.1% | 13.5% | TODO | TODO |
The current nnet3 baseline (checked into master) does amazingly well, actually. But I intend to do the two columns marked with TODO for completeness.
Updating the previous table:
nnet2 p-norm baseline | nnet3, original baseline | nnet3 relu-dim 350, nnet2 splices | nnet3 relu-dim 425, nnet2 splices | nnet3, just like p-norm baseline (for real) | nnet3, relu-dim 1024 | |
---|---|---|---|---|---|---|
Dev | 14.4% | 14.6% | 16.4% | 15.8% | 14.4% | 14.4% |
Dev, rescored | 13.0% | 13.2% | 14.9% | 14.5% | 13.1% | 13.1% |
Test | 13.0% | 13.5% | 15.0% | 14.7% | 13.4% | 13.2% |
Test, rescored | 11.8% | 12.1% | 13.5% | 13.2% | 12.0% | 12.0% |
Experiment | local/online/run_nnet2_ms_perturbed.sh | local/nnet3/run_tdnn.sh | - | - | local/nnet3/run_tdnn_new_splice_d.sh | local/nnet3/run_tdnn_new_splice_e.sh |
# Parameters | 12,232,880 | 6,056,880 | ? | 3,227,930 | 12,232,880 | 12,678,952 |
So the p-norm in nnet3 is pretty competitive with the pnorm nnet2 model. Nice surprise.
What do you think about adding a p-norm example script to the tedlium setup? Dan
On Mon, Jun 20, 2016 at 8:25 PM, Daniel Galvez notifications@github.com wrote:
Updating the previous table: nnet2 p-norm baseline nnet3, original baseline nnet3 relu-dim 350, nnet2 splices nnet3 relu-dim 425, nnet2 splices nnet3, just like p-norm baseline (for real) Dev 14.4% 14.6% 16.4% 15.8% 14.4% Dev, rescored 13.0% 13.2% 14.9% 14.5% 13.1% Test 13.0% 13.5% 15.0% 14.7% 13.4% Test, rescored 11.8% 12.1% 13.5% 13.2% 12.0% Experiment local/online/run_nnet2_ms_perturbed.sh local/nnet3/run_tdnn.sh
- - local/nnet3/run_tdnn_new_splice_d.sh
So the p-norm in nnet3 is pretty competitive with the pnorm nnet2 model. Nice surprise.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-227331068, or mute the thread https://github.com/notifications/unsubscribe/ADJVu1g1-niAeUwUvvy-vRRgBv5AGImLks5qN1mvgaJpZM4IpJFm .
I am a bit confused by all of this. If p-norm was the best setup for Nnet2, (and I recall reading one paper where pnorm showed better results than ReLu) why was ReLu chosen for the Nnet3 implementation ? do we have the same behavior for WSJ or swbd ?
@galv could you compare the number of parameters in the ReLU and p-norm models ?
@vince62s IIRC @pegahgh conducted the SWBD experiments where she found ReLU to be better than p-norm. Based on these observations I conducted experiments on AMI and found similar trend. @pegahgh do you have these results around ?
@vince62s there was a paper which came out of CLSP after the p-norm paper which simply says as an aside that they found relu easier to train than p-norm. So no particularly good reason.
On Mon, Jun 27, 2016 at 9:23 AM, Vijayaditya Peddinti < notifications@github.com> wrote:
@vince62s https://github.com/vince62s IIRC @pegahgh https://github.com/pegahgh conducted the SWBD experiments where she found ReLU to be better than p-norm. Based on these observations I conducted experiments on AMI and found similar trend. @pegahgh https://github.com/pegahgh do you have these results around ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-228743791, or mute the thread https://github.com/notifications/unsubscribe/AEi_UECkLYRd4r2Gy6IY6NHTOwwe493Jks5qP87VgaJpZM4IpJFm .
Daniel Galvez
It is too soon to judge whether to get the p-norm model in yet. Last thing I want to try is a relu model with about the same number of parameters (12,678,952) as the p-norm model (12,232,880). That is running right now.
I updated the table above with the number of parameters in each of my experiments.
If you look at the last two columns of the above table (I updated it again), you can see that a relu nnet3 model does just as well as nnet3 p-norm model when they have approximately the same parameters.
I'm not sure if we really want to merge it these new training scripts. The new models beat the existing nnet3 baseline by a few tenths of a percent at the cost of double the the number of parameters. I'm not sure what to do here.
It would be preferable not to have huge models, I would recommend not merging these scripts. You could add a comment in the PR describing your reasoning in detail for the sake of future reference.
--Vijay
On Wed, Jul 6, 2016 at 7:50 PM, Daniel Galvez notifications@github.com wrote:
If you look at the last two columns of the above table (I updated it again), you can see that a relu nnet3 model does just as well as nnet3 p-norm model when they have approximately the same parameters.
I'm not sure if we really want to merge it these new training scripts. The new models beat the existing nnet3 baseline by a few tenths of a percent at the cost of double the the number of parameters. I'm not sure what to do here.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-230941389, or mute the thread https://github.com/notifications/unsubscribe/ADtwoPritI-UZVFNZ6Y2xRgwPjEK-YLHks5qTD9dgaJpZM4IpJFm .
@galv Can you please confirm that all these results (nnet2 and nnet3) are for 6 hidden layers, 4 epochs training ? Also what you call "baseline nnet3 original" in the second column is for ReLU dim 500, right ?
@vince62s
6 hidden layers: yes. Absolutely positive that every experiment used 6 hidden layers.
4 epochs: I have checked the scripts that I still have, and they all use 4 epochs. Unfortunately, I made a mistake and have forgotten the scripts of the "nnet3 relu-dim 350, nnet2 splices" and the "nnet3 relu-dim 425, nnet2 splices", so I don't know how many epochs they were trained for, but they have few parameters anyway so we shouldn't expect them to compete with the other models.
baseline nnet3 original: yes, but the splicing is different. The splice indices in that model came from the chain models' splice indices. The script is up in the master branch repo.
I ran local/nnet3/run_tdnn.sh on Tedlium release 2, with the equivalent of your second column. My numbers are [ 13.5% 12.3% 12.2% 11.0%]
I have not run the discriminative yet. For the discriminative, I have an issue with the number of epochs. For Nnet2, when we say num_epochs=4 it actually starts at 0 spits out 5 results, and there is still a slight improvement at epoch4 over epoch3. For Nnet3, num_epochs=4 is 4 epochs BUT it seems on release 1 that epoch4 is slightly worse than epoch3 and epoch2 (which is the best one)
I'm personally not too concerned about the nnet2 vs nnet3 differences there: the models do use different nonlinearities and splicing. And the increase in WER at the end is likely due to overfitting, since the nnet3 model is smaller and has less modeling power.
Also, if you want to run the experiment for the rightmost column, just let me know.
@danpovey sorry to bother but when I read this in run_nnet2_ms.sh: steps/nnet2/train_multisplice_accel2.sh --stage $train_stage \ --num-epochs 8 --num-jobs-initial 3 --num-jobs-final 3 \ --num-hidden-layers 6 --splice-indexes "layer0/-2:-1:0:1:2 layer1/-1:2 layer3/-3:3 layer4/-7:2"
which is what you outline above, and this corresponds to the paper for splices values, paper in which it is said 4 layers.
So 2 questions: does the fact that layer2 is ommitted in the config remove the layer completely ? is the --num-hidden-layers 6 relevant ?
Thanks.
@danpovey https://github.com/danpovey sorry to bother but when I read this in run_nnet2_ms.sh: steps/nnet2/train_multisplice_accel2.sh --stage $train_stage \ --num-epochs 8 --num-jobs-initial 3 --num-jobs-final 3 \ --num-hidden-layers 6 --splice-indexes "layer0/-2👎0:1:2 layer1/-1:2 layer3/-3:3 layer4/-7:2"
which is what you outline above, and this corresponds to the paper http://www.danielpovey.com/files/2015_interspeech_multisplice.pdffor splices values, paper in which it is said 4 layers.
I think what you are referring to is the diagram in the paper, which was just an example. yes, --num-hidden-layers 6 is relevant. Because layers 2 and 5 have no splicing specified, it means there is no splicing at those layers, so they are like a feedforward DNN.
In nnet3 scripts, this is done differently: you specify the splicing indexes (with zeros for feedforward layers), and the number of layers is inferred from that. [However, in chain scripts, the final '0' in the splicing indexes is implicit, so you don't need to specify that.]
Thanks, yes I was refering to the graph in the paper because it's exactly these splices on 4 layers, and the results section mention 4 layers. Nevermind. Just curious, how were these splices set found ? same for the other set in the nnet3 baseline script. Just empirically ? after many tries ?
Empirically, basically. E.g. we noticed that it's generally better to have them more widely spaced at the output side. [this is a bit like downsampling in CNNs.] We tried various configurations but nothing systematic. Dan
On Tue, Jul 12, 2016 at 12:00 PM, vince62s notifications@github.com wrote:
Thanks, yes I was refering to the graph in the paper because it's exactly these splices on 4 layers, and the results section mention 4 layers. Nevermind. Just curious, how were these splices set found ? same for the other set in the nnet3 baseline script. Just empirically ? after many tries ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/817#issuecomment-232145678, or mute the thread https://github.com/notifications/unsubscribe/ADJVu3guGHbEDmd5Gbu3xSVKAe9-y2fHks5qU-RCgaJpZM4IpJFm .
what is the exact formula to calculate the number of parameters in both scenario p-norm and ReLu ?
I ran nnet3/run_tdnn.sh 6 hidden layers relu dim 500 splices -1,0,1 -1,0,1,2 -3,0,3 -3,0,3 -3,0,3 -6,-3,0 (I think coming from the chain config) ==> nnet3-am-info gives me 6138543 parameters
6 hidden layers relu dim 500 splices -2,-1,0,1,2 -1,2 0 -3,3 -7,2 0 (coming from the nnet2 recipe, supposedly from discussion above too) ==> nnet3-am-info gives me 4178543 parameters
@galv I am a bit confused by your second column that I understood to be my second config above. Am I correct ? is there another variable impacting the number of parameters ?
Thanks,
The nnet3 models are a bit worse than the best nnet2 models. @danpovey pointed out that there are difference in the TDNN architectures used in nnet2 and nnet3, and in some other hyper parameters. @galv has offered to tune the nnet3 models.
Relevant discussion is available in Issue https://github.com/kaldi-asr/kaldi/issues/768. This issue has been created to track the progress of the nnet3 model tune up separately.