Closed yuseungwoo closed 1 year ago
Sorry for the late response.
Yes, as you expected. We trained two Phoneme tokenizers in our paper, which is a GMM-HMM model using 100-hour data for the Base setting, and a DNN-HMM model using 960-hour data for the Large setting. The GMM-HMM model is exactly 'tri4b' (after stage 13). The DNN-HMM model is exactly the chain model obtained after running the whole script (after the last stage).
steps/decode_fmllr.sh for the GMM-HMM model.
First of all, Thanks your great works and code
I am studying SpeechLM and found some curious things about training and inference.
Can you guide which stage did you use for learning? below #L155 as I expected? [https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh#L155]
Can you guide which decoder is used for Pseudo label generation and share you command ?
steps/decode_fmllr.sh or online2-wav-gmm-latgen-faster directly?
Best Regards