k2-fsa / snowfall

Moved to https://github.com/k2-fsa/icefall
Apache License 2.0
143 stars 42 forks source link

Conformer & convergence #169

Open danpovey opened 3 years ago

danpovey commented 3 years ago

Got this via email from @zhu-han ...

I got the first reasonable result:
2021-04-21 02:28:00,831 INFO [common.py:365] [test-clean] %WER 13.39% [7041 / 52576, 887 ins, 965 del, 5189 sub ]
2021-04-21 02:29:42,636 INFO [common.py:365] [test-other] %WER 35.57% [18619 / 52343, 1327 ins, 3534 del, 13758 sub ]
The training log is  in the attachment. 
And code of this version is in https://github.com/zhu-han/snowfall/commit/4d4a0c42c175571e396736c757ceb6698afc9b18 
The differences with the original version in the paper are:
# this version vs original version
1) 4× subsampling vs 8× subsampling; 
2) kernel size 3 vs kernel size 5;
The model could not converge without these two changes. 
But the performance gap between this and conformer is still large. 
Do you have some advice on that?

For convergence problems: we are working on a way to make tuings converge much easier.. @csukuangfj was going to commit it. It involves using a simpler model as an alignment model, adding its output with the model being trained with a scale less than one, near the start of training.