bpe ctc decoder with a released model

glynpu commented 3 years ago

This pr release a snowfall trained model together with related decode code. Wer on test-clean is lower than previously trained model with snowfall, detailed comparison as following:

avg epoch 26-30	no rescore	no rescore	4-gram lattice rescore	4-gram lattice rescore
	test-clean	test-other	test-clean	test-other
before with LF-MMI loss	4.14	8.41	3.69	7.68
current	3.97	9.78	*	*

INFO:root:[test-clean] %WER 3.97% [2087 / 52576, 220 ins, 166 del, 1701 sub ]
INFO:root:[test-other] %WER 9.78% [5121 / 52343, 535 ins, 439 del, 4147 sub ]

A thing worth to mention is that： current no-rescore result(3.97 on test-clean) is got WITHOUT a 3-gram. Maybe the result will get lower with composing currnet ctc_topo with a 3-gram fst (I am working on this).

Another baseline of this model is an espnet released model, detailed comparison as following: num_paths = 100 when doing n-best rescoring of row 2; result of row 2 is got using similar techiniques used in https://github.com/k2-fsa/snowfall/pull/201, by loading espnet released model with snowfall code.

decoding algorithm	training tool	encoder + k2 ctc decode+no rescore	encoder + k2 ctc decode+ decoder nbest rescore	encoder + k2 ctc decode+ transformer lm nbest rescore	encoder + k2 ctc decode+ decoder nbest rescore+ transformer lm nbest rescore
decoder algorithm in espnet	espnet	*	*	*	2.1%
k2 ctc decode in this pr	espnet	2.97	2.64	2.43	2.35
k2 ctc decode in this pr	snowfall	3.97	*	*	*

Conclusions:

a better snowfall trained model is got before rescore.
current training pipeline is still inferior to it's espnet counterpart; if fix this, current wer 3.97% on test-clean should close to 2.97% (related training code will be submited soon in this week; make a promise here to force me to do this quickly).

danpovey commented 3 years ago

This was trained on 960 hours, I assume? I'm surprised that your "k2 ctc decode" number was got without the 3-gram LM; I had thought you were using that. What do you think are the differences between our trained model and the espnet-trained model? Is it possible to compare the diagnostics from training?

glynpu commented 3 years ago

This was trained on 960 hours, I assume?

Yes, with full libri.

What do you think are the differences between our trained model and the espnet-trained model?

A known huge difference is espnet use warm_up scheduler, while I use Noam optimizer in snowfall. with warmup_step = 40000, and model_size=512, at each step, learning rate of espnet is around 10 times that in this exp. So I am going to retrain to the model with changing lr-factor from 1.0 to 10.0 after current exp finished.

Is it possible to compare the diagnostics from training?

Yes, I have reproduce espnet result and got detail training log, which will be used to diagnose my training process.

pzelasko commented 3 years ago

You might want to check what data augmentation techniques and settings they are using and compare them with our setup. If we’re missing some techniques in Lhotse we can add them.

danpovey commented 3 years ago

So I guess this is ready to merge?

glynpu commented 3 years ago

Maybe @csukuangfj is going to review this afternoon.

csukuangfj commented 3 years ago

+2

csukuangfj commented 3 years ago

Thanks! Merging

k2-fsa / snowfall

bpe ctc decoder with a released model #217