thu-spmi / CAT

A CRF-based ASR Toolkit
Apache License 2.0
324 stars 74 forks source link

language mode not used in aishell? #10

Closed helloword12345678 closed 4 years ago

helloword12345678 commented 4 years ago

thanks for sharing! i do aishell dataset,but i find that language model is not used in the decode process? is i am right?

helloword12345678 commented 4 years ago

what't the purpose of 'den_lm.fst'?

thu-spmi commented 4 years ago

'den_lm.fst' is needed in calculating CRF gradients, which is different from the word-level language model used in decoding. Please refer to [http://oa.ee.tsinghua.edu.cn/~ouzhijian/pdf/ctc-crf.pdf](the CTC-CRF paper) for details, particularly looking for the introduction of the n-gram denominator LM of labels.