k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
911 stars 292 forks source link

conformer-ctc based alignment model insert silence #917

Open oshindow opened 1 year ago

oshindow commented 1 year ago

Hi, all

Can the conformer-ctc phone-based ASR model be trained by a training graph with a lexicon, which has optional silence in it? So there will be some extra silence tokens in the text like Kaldi align-equal-compiled does. Therefore if we decode this model to get the alignment result, we can get some silence tokens.

Or we can add some silence tokens into the text directly? But this method may not deal with the continue silence.

csukuangfj commented 1 year ago

Please see https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc

For the phone-based modeling, it inserts an optional SIL token at the beginning and end of a word. Please see https://github.com/k2-fsa/icefall/blob/c51e6c5b9c8e4d92b9e810e26202c3b3b633c519/egs/librispeech/ASR/local/prepare_lang.py#L268-L270

oshindow commented 1 year ago

Please see https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc

Great suggestion! I combined the Conformer-ctc model with this phone-based graph_compiler. This model converges very well. But there is still no silence token in the alignment result, because the peaky behavior removes the silence token. Is there any methods to solve this problem?