k2-fsa / snowfall

Moved to https://github.com/k2-fsa/icefall
Apache License 2.0
143 stars 42 forks source link

Speeding up training #165

Open danpovey opened 3 years ago

danpovey commented 3 years ago

After having a look at nsys output, I think we are largely limited by latency of sequential operations in IntersectDevice, IntersectDense, GetForwardScores and GetBackwardScores (and of memory transfer when we invoke Array1::Back()). I think there are two ways we can significantly reduce the time taken: