k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
901 stars 287 forks source link

Domain specific Language Model #1138

Open bsshruthi22 opened 1 year ago

bsshruthi22 commented 1 year ago

We are building an ASR using opensource Hindi/English data using k2-fsa/icefall librispeech recipe.We need to build a Language Model for our domain specific data.Please let us know how to go about this. Thank you

bsshruthi22 commented 1 year ago

We are using pruneless stateless transducer7 recipe.

danpovey commented 1 year ago

Hi Sruthi, it was nice to meet you at ICASSP!

I have discussed this with the guys, we are doing 2 things about it:

In future, we will have to investigate whether a CTC system interacts better with this kind of LM biasing than RNN-T systems. The guys have already been doing experiments with adding an auxiliary CTC head to the RNN-T system. The RNN-T helps the CTC head learn better (but not vice versa), and I think the CTC WER is nearly as good as the RNN-T one.

marcoyang1998 commented 1 year ago

Hi @bsshruthi22 , I'm writing documentation for decoding with language models which should be available very soon. I will update here once I made the PR.

bsshruthi22 commented 1 year ago

Thank you very much Dan and @marcoyang1998 .

bsshruthi22 commented 1 year ago

Any updates on this @danpovey @marcoyang1998 ?

pkufool commented 1 year ago

@bsshruthi22 https://k2-fsa.github.io/icefall/decoding-with-langugage-models/index.html