srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
824 stars 343 forks source link

decoding without cmvn #70

Open xfwu opened 8 years ago

xfwu commented 8 years ago

Hi Yajie

Thank you very much for your kindly replying..

I want to ask another question: Current Eessen performs best with CMVN and biLSTM, but in real scenario it would take too much time to wait for the whole utterance finish and CMVN might not accessible (like user change their location or even user changing). What is the best strategy in such situation?

All the best

Xiaofeng

double22a commented 8 years ago

In here, CMVN for each spk?

xfwu commented 8 years ago

yes, from what I've seen, current Eesen performs best with CMVN for each speaker

Best

double22a commented 8 years ago

Thanks, Do you test EESEN with Uni-LSTM layers ? With Uni, I get bad WER on the swbd.

xfwu commented 8 years ago

sorry I don't have such result, did you compare it with Bi-LSTM, how different are they?

double22a commented 8 years ago

For Bi-LSTM, Comparing with CE(Kaldi_nnet1), CTC(EESEN) can get similar WER on the swbd. But for Uni-LSTM, absolute value of the difference between CTC and CE is 3.5%.

xfwu commented 8 years ago

thank you very much Baylor

xfwu commented 8 years ago

hi baylor0118

I wonder ( I am not an expert in ASR) do you see any one tried something like semi-unilstm?

semi-unilstm in my mind is for each frame, we get all the lstm from left and some length (like half second) of lstm from right?

Best

double22a commented 8 years ago

@xfwu Yeah. It's OK. For Bi-LSTM, we use this way.

xfwu commented 8 years ago

great!,Can I ask how does this Semi-uniLSTM perform?

Best

ramonsanabria commented 7 years ago

Here you can find an approach of how to compute cmvn in online applications:

http://kaldi-asr.org/doc/online_decoding.html

(Cepstral mean and variance normalization in online decoding section)