kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.09k stars 5.31k forks source link

History state info lost in nnet computer in cuda-decoder, which cause accuray decreases #4189

Open housebaby opened 4 years ago

housebaby commented 4 years ago

In cuda-decoder, it seems that if a rnn model is used, the NnetComputer will re-initialized for each chunk. That's ok for non-recurrent network . But for recurrent network , it means the history information ( like cell state of LSTM) will be lost. Though we can recover the accuray by appending left context (egs. 40) , the realtTimeX will decreased while the latency will be increased, thus making it unable to do real-time decoding, as the latency may be larger than the chunk size. Is it possible to use previous chunk lstm state to initialize current NnetComputer, like DecodableAmNnetLoopedOnline does?

frames-per-chunk 12 21 21 6 21 21 12 30 baseline(online2-wav-nnet3-latgen-faster)
extra-left-context 0 0 40 40 140 40 40 40 0
cuda-use-tensor-cores T T T T T F F F F
RealTimeX 391.8295 389.701 347.0624 146.4589 176.2909 275.510625 189.78275 330.1645  
Latencies 0.09675 0.124875 0.445125 4.870625 3.51 1.26 3.104 0.6005  
Latencies-90% 0.157 0.188875 0.65275 7.1775 5.06775 1.840625 4.5725 0.8495  
Latencies-95% 0.207625 0.203875 0.724375 8.114875 5.773375 2.059 5.162125 0.932375  
Latencies-99% 0.269 0.223875 0.875125 9.986 7.023 2.51075 6.35875 1.1115  
字准 90.04% 92.30% 94.77% 94.01% 94.71% 94.81% 94.48% 94.69% 94.99%
句准 71.34% 77.31% 82.95% 81.12% 82.85% 83.00% 82.22% 82.83% 83.40%

Is it possible to derive cudadecoder in a loop way? @hugovbraun @danpovey

hugovbraun commented 4 years ago

You're right, the current neural net context switch mechanism of the online pipeline has been designed for CNN-based networks.

Regarding relying on the inner state of a looped computer, we cannot really do that because two batches are always different. Batch 1 may contain chunks from utt2, utt7, and utt4, while batch 2 may contain chunks from utt3, utt4 and utt7. The inner state is per batch slot and everything would get mixed up.

We could have a version for RNN-based model, by storing/restoring the inner output of the lstm cells. We would just need a way to get those tensors. Something like computer->GetInnerOuput() or GetLSTMCell(). Or maybe we can just do the context switch using GetOutput() and SetInput() if the RNN is not a loop? @danpovey what do you think?

housebaby commented 4 years ago

You're right, the current neural net context switch mechanism of the online pipeline has been designed for CNN-based networks.

Regarding relying on the inner state of a looped computer, we cannot really do that because two batches are always different. Batch 1 may contain chunks from utt2, utt7, and utt4, while batch 2 may contain chunks from utt3, utt4 and utt7. The inner state is per batch slot and everything would get mixed up.

We could have a version for RNN-based model, by storing/restoring the inner output of the lstm cells. We would just need a way to get those tensors. Something like computer->GetInnerOuput() or GetLSTMCell(). Or maybe we can just do the context switch using GetOutput() and SetInput() if the RNN is not a loop? @danpovey what do you think?

@hugovbraun Now, I see. Thank you very much. Then do you have any plan to surpport loop mode of RNN in cudadecoder?

danpovey commented 4 years ago

Regarding the RNN stuff: sorry, I don't have the bandwidth to work on that. It's quite complicated, and I'm spending what energy I have on next-gen stuff, e.g. the k2 project.

On Mon, Aug 3, 2020 at 12:20 PM housebaby notifications@github.com wrote:

You're right, the current neural net context switch mechanism of the online pipeline has been designed for CNN-based networks.

Regarding relying on the inner state of a looped computer, we cannot really do that because two batches are always different. Batch 1 may contain chunks from utt2, utt7, and utt4, while batch 2 may contain chunks from utt3, utt4 and utt7. The inner state is per batch slot and everything would get mixed up.

We could have a version for RNN-based model, by storing/restoring the inner output of the lstm cells. We would just need a way to get those tensors. Something like computer->GetInnerOuput() or GetLSTMCell(). Or maybe we can just do the context switch using GetOutput() and SetInput() if the RNN is not a loop? @danpovey https://github.com/danpovey what do you think?

@hugovbraun https://github.com/hugovbraun Now, I see. Thank you very much. Then do you have any plan to surpport loop mode of RNN in cudadecoder?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4189#issuecomment-667794336, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZRMDWDIK7GECHY2EDR6Y3JJANCNFSM4PFKIJ3A .

hugovbraun commented 4 years ago

Are you moving away from RNN-type models completely? Or are you just saying that nnet3 will be soon deprecated and we should do that work on the next gen stuff instead (e.g. with pytorch running the neural net, or else) ?

danpovey commented 4 years ago

I'm not against RNNs, it's just that I don't have the bandwidth right now to handle the stuff required to do RNNLMs efficiently on GPU, and also work on the next-gen stuff (yes, that will involve pytorch for the neural net).

On Tue, Aug 4, 2020 at 4:48 AM Hugo Braun notifications@github.com wrote:

Are you moving away from RNN-type models completely? Or are you just saying that nnet3 will be soon deprecated and we should do that work on the next gen stuff instead (e.g. with pytorch running the neural net, or else) ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4189#issuecomment-668233984, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3YS4WQXVWKDHWRTQTR64PADANCNFSM4PFKIJ3A .

hugovbraun commented 4 years ago

I understand, we could actually look at working on it ourselves, it's just a matter of knowing what to do. The context switch mechanism for RNNs would be fairly straightforward, but the big question is should we make it compatible with nnet3 or is it going to be outdated soon? @housebaby it wouldn't be a loop mode, because it would require having a static batch slot per audio channel. Long story short, it would run with batch size 1. However we can have the exact same RNN network run in non-loop mode and add a context switch mechanism on top of it. Similar to what we do with CNNs but in a RNN-friendly way. It would be transparent to the user.