k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
792 stars 267 forks source link

How to use an external RNN-LM (mono-lingual) with a bilingual ASR? #1569

Closed sangeet2020 closed 4 weeks ago

sangeet2020 commented 1 month ago

Hi K2 team,

Thank you so much for your amazingly efficient toolkit in streaming focused ASR.

I have trained an EN-DE bilingual streaming ASR model using this receipe. However, I am not really satisfied with the performance on the English side, and I want to use an externally trained RNN LM (trained using this receipe) to strengthen the WER only on the English side.

I tried using --decoding-method modified_beam_search_lm_shallow_fusion and using English RNN-LM, however, ran into errors due to different vocab size used. vocab size for bilingual ASR training = 1000 (500 for EN and 500 for DE) and vocab size used for English RNN-LM = 500.

I wonder if its possible to use a monolingual RNN LM with a bilingual ASR model.

Alternatively, is it possible to combine two RNN-LMs? or somehow interpolate them? I saw some related discussions here: https://github.com/kaldi-asr/kaldi/issues/2069.

Thank You

marcoyang1998 commented 1 month ago

I think it's possible as long as the German bpe and English bpe are distinguishable.

And you also need to make sure which language you are decoding, otherwise you might end up rescoring the German utterance with English RNNLM.

sangeet2020 commented 1 month ago

but wouldnt different vocab size of the BPE model for ASR and RNN-LM create an issue in the first place.

When the loading the RNN LM

            model = RnnLmModel(
                vocab_size=params.vocab_size,
                embedding_dim=params.rnn_lm_embedding_dim,
                hidden_dim=params.rnn_lm_hidden_dim,
                num_layers=params.rnn_lm_num_layers,
                tie_weights=params.rnn_lm_tie_weights,
            )

params.vocab_size is the size of the sentence piece tokenizer from ASR (1000 in my case), which is different from the actual RNN LM vocab size (500 in my case). How can I overcome this?

marcoyang1998 commented 1 month ago

You need to change the code, I only mean that it's theoretically possible to use a mono-lingual RNNLM to rescore multi-lingual ASR model.