mailong25 / self-supervised-speech-recognition

speech to text with self-supervised learning based on wav2vec 2.0 framework
379 stars 115 forks source link

how to do lexicon free decoding #51

Open vigneshgig opened 3 years ago

vigneshgig commented 3 years ago

Hi, To do lexicon-free decoding I set the args.lexicon to False in wsl_decoder.py. But I got an empty string. So Please explain how to do lexicon free decoding. In wsl_decoder.py i have seen this line "lexicon free decoding can only be done with a unit language model" . if that so how can i create unit language model.

Right now I having this below problem in the concurrent lexicon lm model. For example, In lexicon.txt I have two words tamil and ama. if I send tamil spoken audio to ASR. it predicts as a tamah(no lm). After language model processing it always giving as a ama. another example: Now in lexicon.txt I have tamil and am words. If asr predicts as a tamah. the language model giving string as am. I just want to solve this problem. I tried all different value of lm weight or wordgroup parameter but no use. I know that lm_weigt and word group will not influence that much in these scenarios. Thanks, Please anyone helpme out. @SenriYoshikawa @mailong25