Open xfwu opened 8 years ago
In here, CMVN for each spk?
yes, from what I've seen, current Eesen performs best with CMVN for each speaker
Best
Thanks, Do you test EESEN with Uni-LSTM layers ? With Uni, I get bad WER on the swbd.
sorry I don't have such result, did you compare it with Bi-LSTM, how different are they?
For Bi-LSTM, Comparing with CE(Kaldi_nnet1), CTC(EESEN) can get similar WER on the swbd. But for Uni-LSTM, absolute value of the difference between CTC and CE is 3.5%.
thank you very much Baylor
hi baylor0118
I wonder ( I am not an expert in ASR) do you see any one tried something like semi-unilstm?
semi-unilstm in my mind is for each frame, we get all the lstm from left and some length (like half second) of lstm from right?
Best
@xfwu Yeah. It's OK. For Bi-LSTM, we use this way.
great!,Can I ask how does this Semi-uniLSTM perform?
Best
Here you can find an approach of how to compute cmvn in online applications:
http://kaldi-asr.org/doc/online_decoding.html
(Cepstral mean and variance normalization in online decoding section)
Hi Yajie
Thank you very much for your kindly replying..
I want to ask another question: Current Eessen performs best with CMVN and biLSTM, but in real scenario it would take too much time to wait for the whole utterance finish and CMVN might not accessible (like user change their location or even user changing). What is the best strategy in such situation?
All the best
Xiaofeng