Open oshindow opened 1 year ago
Please see https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc
For the phone-based modeling, it inserts an optional SIL token at the beginning and end of a word. Please see https://github.com/k2-fsa/icefall/blob/c51e6c5b9c8e4d92b9e810e26202c3b3b633c519/egs/librispeech/ASR/local/prepare_lang.py#L268-L270
Please see https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc
Great suggestion! I combined the Conformer-ctc model with this phone-based graph_compiler. This model converges very well. But there is still no silence token in the alignment result, because the peaky behavior removes the silence token. Is there any methods to solve this problem?
Hi, all
Can the conformer-ctc phone-based ASR model be trained by a training graph with a lexicon, which has optional silence in it? So there will be some extra silence tokens in the text like Kaldi align-equal-compiled does. Therefore if we decode this model to get the alignment result, we can get some silence tokens.
Or we can add some silence tokens into the text directly? But this method may not deal with the continue silence.