kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
13.96k stars 5.29k forks source link

Tedlium chain config question #914

Closed vince62s closed 7 years ago

vince62s commented 7 years ago

When running the chain config script for Tedlium, I saw that run_tdnn.sh and run_ivector_common.sh are not in sync since the former call the latter with --speed-perturb true which is not defined the repo version. Also Dan fixed this one ali_dir=tri3_ali_sp.

Now my question is: is the repo version of the Chain script the last version related to the paper, and subsequently is it the best config according to the various testing done in the SWBD repo ?

thanks. Vincent

galv commented 7 years ago

Sounds like the --speed-perturb thing would cause a crash. Let me check it out.

Regarding ali_dir=tri3_ali_sp, Vijay actually advised me to to use non-speed-perturbed alignments for tree building. I can't rembmer if $ali_dir was used anywhere else.

The repo version was the last version as of the first submission. We got about 1% absolute WER decrease for the camera ready submission with some data cleanup scripts by Vimal Manohar, but that hasn't made it into master. I'd recommend waiting on Dan's changes to tedlium to explore the effect of data cleanup for tedlium 2

vince62s commented 7 years ago

okay, thanks. Just out of curiosity what was the swbd baseline script you took to start with ? 'cause now there are so many.

galv commented 7 years ago

Initially, it was based on swbd/s5c/local/chain/run_tdnn_2y.sh. However, this used the "jesus" nonlinearity, which we later abandoned. Since, then it organically developed, partially with hints from Vijay and Dan. Are you wondering anything in particular?

vince62s commented 7 years ago

Nothing particular, just making sure it was the best known parameters. Actually it has 7 layers right now with the current splices indexes. (run_tdnn_2y.sh has only 5). Btw, with the same relu dim (500) and same splicing indexes ("-1,0,1 -1,0,1,2 -3,0,3 -3,0,3 -3,0,3 -6,-3,0 0"), the chain model ends up with more parameters than the nnet3 equivalent. Might be due to the output dim (?).

Anyway on tedlium2 it's giving me the best results right now (with 7.98M parameters vs 6.39 for nnet3) %WER 12.4 | 507 17792 | 89.8 7.4 2.8 2.2 12.4 83.6 | -0.162 | exp/chain/tdnn/decode_dev/score_8_0.5/ctm.filt.filt.sys %WER 11.6 | 507 17792 | 90.5 6.9 2.5 2.1 11.6 80.1 | -0.247 | exp/chain/tdnn/decode_dev_rescore/score_8_0.0/ctm.filt.filt.sys %WER 11.4 | 1155 27512 | 90.2 6.8 3.0 1.6 11.4 78.5 | -0.042 | exp/chain/tdnn/decode_test/score_9_0.0/ctm.filt.filt.sys %WER 10.4 | 1155 27512 | 91.1 6.2 2.7 1.5 10.4 74.6 | -0.167 | exp/chain/tdnn/decode_test_rescore/score_8_0.0/ctm.filt.filt.sys

numbers for nnet3 [13.2% 12.1% 12.2% 10.9%]

danpovey commented 7 years ago

Cool. Hopefully the chain models will be even better once we get the transcript clean-up finished. Tedlium [at least release 1] has extremely noisy transcripts which will tend to affect the chain model a lot. Dan

On Mon, Jul 25, 2016 at 4:44 AM, vince62s notifications@github.com wrote:

Nothing particular, just making sure it was the best known parameters. Actually it has 7 layers right now with the current splices indexes. (run_tdnn_2y.sh has only 5). Btw, with the same relu dim (500) and same splicing indexes ("-1,0,1 -1,0,1,2 -3,0,3 -3,0,3 -3,0,3 -6,-3,0 0"), the chain model ends up with more parameters than the nnet3 equivalent. Might be due to the output dim (?).

Anyway on tedlium2 it's giving me the best results right now (with 7.98M parameters vs 6.39 for nnet3) %WER 12.4 | 507 17792 | 89.8 7.4 2.8 2.2 12.4 83.6 | -0.162 | exp/chain/tdnn/decode_dev/score_8_0.5/ctm.filt.filt.sys %WER 11.6 | 507 17792 | 90.5 6.9 2.5 2.1 11.6 80.1 | -0.247 | exp/chain/tdnn/decode_dev_rescore/score_8_0.0/ctm.filt.filt.sys %WER 11.4 | 1155 27512 | 90.2 6.8 3.0 1.6 11.4 78.5 | -0.042 | exp/chain/tdnn/decode_test/score_9_0.0/ctm.filt.filt.sys %WER 10.4 | 1155 27512 | 91.1 6.2 2.7 1.5 10.4 74.6 | -0.167 | exp/chain/tdnn/decode_test_rescore/score_8_0.0/ctm.filt.filt.sys

numbers for nnet3 [13.2% 12.1% 12.2% 10.9%]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-234931314, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu_S6iQxSSYHkkzWYAMv1NDz-iG70ks5qZKGggaJpZM4JTj_v .

vince62s commented 7 years ago

I would like to try the left-biphone config with Tedlium. If my understanding is ok it is just a matter of changing the following line utils/mkgraph.sh --left-biphone or do I need to do more ? thanks

danpovey commented 7 years ago

You need to do more. See diff ../../swbd/s5c/local/chain/run_tdnn_7{b,d}.sh

The key changes are, firstly if you use a shared tree-dir you should change it, but also add the option

>       --context-opts "--context-width=2 --central-position=1" \

to steps/nnet3/chain/build_tree.sh.

On Fri, Aug 5, 2016 at 4:20 AM, vince62s notifications@github.com wrote:

I would like to try the left-biphone config with Tedlium. If my understanding is ok it is just a matter of changing the following line utils/mkgraph.sh --left-biphone or do I need to do more ? thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-237826227, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu4Sj6hpw8GnmesCyNvGysBKNo92dks5qcxxygaJpZM4JTj_v .

vince62s commented 7 years ago

ok left-biphone results very close to "regular" chain.

%WER 12.5 | 507 17792 | 89.6 7.2 3.2 2.2 12.5 84.8 | -0.077 | exp/chain/tdnn/decode_dev/score_9_0.0/ctm.filt.filt.sys %WER 11.6 | 507 17792 | 90.5 6.6 2.9 2.1 11.6 82.1 | -0.214 | exp/chain/tdnn/decode_dev_rescore/score_8_0.0/ctm.filt.filt.sys %WER 11.4 | 1155 27512 | 90.3 7.0 2.7 1.7 11.4 77.9 | -0.067 | exp/chain/tdnn/decode_test/score_8_0.0/ctm.filt.filt.sys %WER 10.3 | 1155 27512 | 91.2 6.0 2.9 1.4 10.3 73.9 | -0.153 | exp/chain/tdnn/decode_test_rescore/score_8_0.0/ctm.filt.filt.sys

danpovey commented 7 years ago

OK- it's hard to interpret these without seeing the baseline; but anyway, please commit the change. BTW, when I tried the left-biphone models on the AMI s5b recipe, there was a clear improvement-- something like 0.4% absolute on average across all the conditions. Dan

On Sun, Aug 7, 2016 at 6:47 AM, vince62s notifications@github.com wrote:

ok left-biphone results very close to "regular" chain.

%WER 12.5 | 507 17792 | 89.6 7.2 3.2 2.2 12.5 84.8 | -0.077 | exp/chain/tdnn/decode_dev/score_9_0.0/ctm.filt.filt.sys %WER 11.6 | 507 17792 | 90.5 6.6 2.9 2.1 11.6 82.1 | -0.214 | exp/chain/tdnn/decode_dev_rescore/score_8_0.0/ctm.filt.filt.sys %WER 11.4 | 1155 27512 | 90.3 7.0 2.7 1.7 11.4 77.9 | -0.067 | exp/chain/tdnn/decode_test/score_8_0.0/ctm.filt.filt.sys %WER 10.3 | 1155 27512 | 91.2 6.0 2.9 1.4 10.3 73.9 | -0.153 | exp/chain/tdnn/decode_test_rescore/score_8_0.0/ctm.filt.filt.sys

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-238083333, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu_Zhb9k-GVJ2d2iLBBcV3RDcscO3ks5qdeH5gaJpZM4JTj_v .

vince62s commented 7 years ago

it was 5 comments above last one, same thread. Not committing since there is no script for nnet3 or chain right now. But just letting you know for when you commit your script, baseline coud become left-biphone then.

danpovey commented 7 years ago

OK thanks. I'm starting from the AMI s5b scripts when creating the TEDLIUM nnet3 and chain scripts. The AMI s5b scripts currently use left-biphone, which was quite a bit better on that setup. Of course if the results are worse than the existing scripts we'll have to do some more investigation.

vince62s commented 7 years ago

@danpovey @vijayaditya , I noticed that the splicing indexes copied over from the AMI s5b recipes are different from what was usually taken. it has 2 final layers 0 0. Is there any improvements using this splicing versus previous ones ?

vijayaditya commented 7 years ago

IIRC TDNN in AMI recipe has more layers as I wanted to create a comparison with a recipe involving a different neural network toolkit. I think it was also giving me better results at that time.

--Vijay

On Tue, Aug 9, 2016 at 5:43 PM, vince62s notifications@github.com wrote:

@danpovey https://github.com/danpovey @vijayaditya https://github.com/vijayaditya , I noticed that the splicing indexes copied over from the AMI s5b recipes are different from what was usually taken. it has 2 final layers 0 0. Is there any improvements using this splicing versus previous ones ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-238701313, or mute the thread https://github.com/notifications/unsubscribe-auth/ADtwoFyb1L5hjt7WFUPtXFVIXSLu4EV2ks5qePR1gaJpZM4JTj_v .

danpovey commented 7 years ago

I just copied it over. @vince62s-- I haven't really done any tuning on this recipe at all, I've just guessed at all the parameter settings. If you have time to do any tuning it would be great. I'm about to check in some changes to the nnet3 and chain TDNN setups for tedlium s5_r2, which will include the results.

Dan

On Tue, Aug 9, 2016 at 2:49 PM, Vijayaditya Peddinti < notifications@github.com> wrote:

IIRC TDNN in AMI recipe has more layers as I wanted to create a comparison with a recipe involving a different neural network toolkit. I think it was also giving me better results at that time.

--Vijay

On Tue, Aug 9, 2016 at 5:43 PM, vince62s notifications@github.com wrote:

@danpovey https://github.com/danpovey @vijayaditya https://github.com/vijayaditya , I noticed that the splicing indexes copied over from the AMI s5b recipes are different from what was usually taken. it has 2 final layers 0 0. Is there any improvements using this splicing versus previous ones ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-238701313, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADtwoFyb1L5hjt7WFUPtXFVIXSLu4EV2ks5qePR1gaJpZM4JTj_v .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-238702893, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu8kZ0qCEs3MUGU4oBbDxHIs6PoU4ks5qePXpgaJpZM4JTj_v .

vince62s commented 7 years ago

I am running the new baseline now to see if I match the results first. BTW, I had to comment a line in run_tdnn.sh

touch $dir/egs/.nodelete # keep egs around when that run dies.

otherwise it hangs because before first run this dir does not exist.

when done I will try various set ups

danpovey commented 7 years ago

thanks, fixing that bug.

On Fri, Aug 12, 2016 at 7:11 AM, vince62s notifications@github.com wrote:

I am running the new baseline now to see if I match the results first. BTW, I had to comment a line in run_tdnn.sh touch $dir/egs/.nodelete # keep egs around when that run dies.

otherwise it hangs because before first run this dir does not exist.

when done I will try various set ups

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-239456708, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu-C9MLYaQ_leDUbkKHxkZxDkqbZaks5qfH8OgaJpZM4JTj_v .

vince62s commented 7 years ago

hmm weird, results chain no cleanup are different from yours: %WER 10.6 | 507 17783 | 90.8 6.5 2.7 1.5 10.6 80.1 | -0.007 | exp/chain/tdnn_sp_bi/decode_dev/score_8_0.0/ctm.filt.filt.sys %WER 10.1 | 507 17783 | 91.0 5.5 3.5 1.1 10.1 80.3 | 0.096 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys %WER 10.5 | 1155 27500 | 90.8 6.2 3.1 1.3 10.5 76.5 | 0.048 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys %WER 9.9 | 1155 27500 | 91.1 5.4 3.4 1.1 9.9 73.0 | 0.073 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

yours are: %WER 11.0 | 507 17783 | 90.9 6.5 2.6 1.9 11.0 80.5 | 0.004 | exp/chain/tdnn_sp_bi/decode_dev/score_8_0.0/ctm.filt.filt.sys %WER 10.6 | 507 17783 | 90.7 5.5 3.8 1.3 10.6 79.3 | 0.070 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys %WER 10.1 | 1155 27500 | 91.2 6.0 2.8 1.3 10.1 75.5 | -0.004 | exp/chain/tdnn_sp_bi/decode_test/score_8_0.0/ctm.filt.filt.sys %WER 9.8 | 1155 27500 | 91.2 5.2 3.7 1.0 9.8 73.2 | 0.055 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

danpovey commented 7 years ago

It looks like, for, you, the LM rescoring makes much less difference. Did you regenerate your language models based on the latest scripts? Because I pruned the model smaller than yours, so the graphs wouldn't be as large. Dan

On Fri, Aug 12, 2016 at 3:05 PM, vince62s notifications@github.com wrote:

hmm weird, results chain no cleanup are different from yours: %WER 10.6 | 507 17783 | 90.8 6.5 2.7 1.5 10.6 80.1 | -0.007 | exp/chain/tdnn_sp_bi/decode_dev/score_8_0.0/ctm.filt.filt.sys %WER 10.1 | 507 17783 | 91.0 5.5 3.5 1.1 10.1 80.3 | 0.096 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys %WER 10.5 | 1155 27500 | 90.8 6.2 3.1 1.3 10.5 76.5 | 0.048 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys %WER 9.9 | 1155 27500 | 91.1 5.4 3.4 1.1 9.9 73.0 | 0.073 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

yours are: %WER 11.0 | 507 17783 | 90.9 6.5 2.6 1.9 11.0 80.5 | 0.004 | exp/chain/tdnn_sp_bi/decode_dev/score_8_0.0/ctm.filt.filt.sys %WER 10.6 | 507 17783 | 90.7 5.5 3.8 1.3 10.6 79.3 | 0.070 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys %WER 10.1 | 1155 27500 | 91.2 6.0 2.8 1.3 10.1 75.5 | -0.004 | exp/chain/tdnn_sp_bi/decode_test/score_8_0.0/ctm.filt.filt.sys %WER 9.8 | 1155 27500 | 91.2 5.2 3.7 1.0 9.8 73.2 | 0.055 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-239571105, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu3edLroR36Jqao0hoO7J9CdzVBo8ks5qfO42gaJpZM4JTj_v .

vince62s commented 7 years ago

Yes I rebuilt everything

vince62s commented 7 years ago

Actually i have to check the lexicon i took...

vince62s commented 7 years ago

I think you misread, my rescoring is -0.5% and -0.6% when yours is -0.4% and -0.3%. I checked my vocab which is different from yours. You take the Cantab-tedlium.dct which is 150000 words (which is from the original recipe script) and I take the merger of this one and the tedlium-release 2 one, which leads to 191010 words. It does explain the little improvement on the dev set but does not explain the worse results on the test set. the difference 10.5 versus 10.1 seems too big.

vince62s commented 7 years ago

I think also that method is different. I guess you trained on cleaned data and decoded first cleaned data and then just decoded non cleaned data on pre trained (on cleaned) since I read this: local/chain/run_tdnn.sh --train-set train --gmm tri3 --nnet3-affix "" --stage 20 stage 20 is just decoding.

my run was on non cleaned data all the way through, including training.

danpovey commented 7 years ago

Firstly, ignore the --stage, that shouldn't have been there. I didn't do anything weird or complicated-- I would never do that without making it clear in the README. Anyway the cleanup only happens on training data, not test, so what you're saying isn't feasible.

The runs aren't exactly replicable anyway, but if your LMs and lexicons are not the same it's even less expected that it would be exactly the same. If you can verify that some change in the lexicon-building procedure or LM-building procedure would be better, then we can talk about making that change.

Dan

On Sat, Aug 13, 2016 at 1:57 AM, vince62s notifications@github.com wrote:

I think also that method is different. I guess you trained on cleaned data and decoded first cleaned data and then just decoded non cleaned data on pre trained (on cleaned) since I read this: local/chain/run_tdnn.sh --train-set train --gmm tri3 --nnet3-affix "" --stage 20 stage 20 is just decoding.

my run was on non cleaned data all the way through, including training.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-239610846, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu0dK0AP__D-ctLuYY8amugVcQb9lks5qfYb5gaJpZM4JTj_v .

vince62s commented 7 years ago

ok but I want to replicate your results first then. Can I just rebuild the LM + L + G and then make the graph without retraining everything ?

danpovey commented 7 years ago

That should take you most of the way there-- however, you should verify that the phones.txt does not change (it shouldn't, just check). But you decode with the lexicon that has pronprobs, so you'd need to rerun the things in stage 5 of the run.sh that get the pronprobs.

The training of the models wouldn't be exactly the same, but it should be about the same.

Dan

On Sat, Aug 13, 2016 at 1:31 PM, vince62s notifications@github.com wrote:

ok but I want to replicate your results first then. Can I just rebuild the LM + L + G and then make the graph without retraining everything ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-239640089, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu0rvosHMflKdmaLiiUntLBUKcOnDks5qfimPgaJpZM4JTj_v .

vince62s commented 7 years ago

very different results so re-running from scratch, I want a clean start.

vince62s commented 7 years ago

my results, from scratch (no cleanup): for dev set , -0.1 diff with yours before rescoring. for test set +0.3 diff with yours before rescoring. does this make sense ? I thought it would be the same. the only difference is that I run on a single machine with start and final job numbers=2 at nnet training level. %WER 10.9 | 507 17783 | 90.4 6.0 3.5 1.3 10.9 81.3 | 0.048 | exp/chain/tdnn_sp_bi/decode_dev/score_9_0.0/ctm.filt.filt.sys %WER 10.6 | 507 17783 | 90.5 5.4 4.1 1.1 10.6 80.1 | 0.060 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys %WER 10.4 | 1155 27500 | 90.8 5.9 3.3 1.2 10.4 75.7 | 0.082 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys %WER 9.9 | 1155 27500 | 91.0 5.2 3.8 1.0 9.9 73.9 | 0.090 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

danpovey commented 7 years ago

If the num-jobs-nnet are different, we don't expect the WERs to be the same. [even if they are the same, things aren't 100% reproducible anyway, this is unavoidable due to the way GPUs work]. So it's OK.

Dan

On Tue, Aug 16, 2016 at 6:17 AM, vince62s notifications@github.com wrote:

my results, from scratch (no cleanup): for dev set , -0.1 diff with yours before rescoring. for test set +0.3 diff with yours before rescoring. does this make sense ? I thought it would be the same. the only difference is that I run on a single machine with start and final job numbers=2 at nnet training level. %WER 10.9 | 507 17783 | 90.4 6.0 3.5 1.3 10.9 81.3 | 0.048 | exp/chain/tdnn_sp_bi/decode_dev/score_9_0.0/ctm.filt.filt.sys %WER 10.6 | 507 17783 | 90.5 5.4 4.1 1.1 10.6 80.1 | 0.060 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys %WER 10.4 | 1155 27500 | 90.8 5.9 3.3 1.2 10.4 75.7 | 0.082 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys %WER 9.9 | 1155 27500 | 91.0 5.2 3.8 1.0 9.9 73.9 | 0.090 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/914#issuecomment-240098171, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu1pD4uv_DrKLVluH_naS-qm8MQjVks5qgbhogaJpZM4JTj_v .