Closed glynpu closed 3 years ago
This was trained on 960 hours, I assume? I'm surprised that your "k2 ctc decode" number was got without the 3-gram LM; I had thought you were using that. What do you think are the differences between our trained model and the espnet-trained model? Is it possible to compare the diagnostics from training?
This was trained on 960 hours, I assume?
Yes, with full libri.
What do you think are the differences between our trained model and the espnet-trained model?
A known huge difference is espnet use warm_up scheduler, while I use Noam optimizer in snowfall. with warmup_step = 40000, and model_size=512, at each step, learning rate of espnet is around 10 times that in this exp. So I am going to retrain to the model with changing lr-factor from 1.0 to 10.0 after current exp finished.
Is it possible to compare the diagnostics from training?
Yes, I have reproduce espnet result and got detail training log, which will be used to diagnose my training process.
You might want to check what data augmentation techniques and settings they are using and compare them with our setup. If we’re missing some techniques in Lhotse we can add them.
So I guess this is ready to merge?
Maybe @csukuangfj is going to review this afternoon.
+2
Thanks! Merging
This pr release a snowfall trained model together with related decode code. Wer on test-clean is lower than previously trained model with snowfall, detailed comparison as following:
A thing worth to mention is that: current no-rescore result(3.97 on test-clean) is got WITHOUT a 3-gram. Maybe the result will get lower with composing currnet ctc_topo with a 3-gram fst (I am working on this).
Another baseline of this model is an espnet released model, detailed comparison as following: num_paths = 100 when doing n-best rescoring of row 2; result of row 2 is got using similar techiniques used in https://github.com/k2-fsa/snowfall/pull/201, by loading espnet released model with snowfall code.
Conclusions: