Closed csukuangfj closed 1 year ago
@csukuangfj Look very promising. Ping me if you need an extra hand
@csukuangfj Look very promising. Ping me if you need an extra hand
@ezerhouni
Thanks! I will draft a version without batch size support. If it gives promising results, we need your help to implement a version that supports batches.
@csukuangfj Do you have any update on this issue ? I am very eager to try it out !
@csukuangfj Do you have any update on this issue ? I am very eager to try it out !
Yes. But the results are not good so far. I will post them tonight.
Steps for reproducing the following results:
cd egs/librispeech/ASR
git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
mkdir tmp3-3
cd tmp3-3
ln -s $PWD/../https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/pretrained-iter-468000-avg-16.pt epoch-99.pt
cd ..
./generate-lm.sh
for lm_scale in 0.01 0.2 0.4 ; do
./lstm_transducer_stateless2/decode.py \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--exp-dir ./tmp3-3 \
--max-duration 600 \
--num-encoder-layers 12 \
--rnn-hidden-size 1024 \
--decoding-method modified_beam_search2 \
--beam 8 \
--max-contexts 4 \
--ngram-lm-scale $lm_scale
done
You will find the results inside ./tmp3-3/modified_beam_search2
ngram_lm_scale | test-clean | test-other |
---|---|---|
0 (baseline) | 2.73 | 7.15 |
-0.01 | 2.73 | 7.17 |
0.01 | 2.74 | 7.15 |
-0.05 | 2.75 | 7.19 |
0.2 | 2.76 | 7.28 |
-0.1 | 2.77 | 7.23 |
-0.2 | 2.83 | 7.46 |
-0.3 | 3.01 | 7.75 |
I am using a tri-gram
LM. Note the cost on the final state of the FST is not considered
I will recheck the code in case it contains some bugs.
@csukuangfj Thanks !
I expect that unless there is some kind of domain mismatch, we will not see much or any improvement. (Unless we try super-large LMs. I seem to remember Liyong had some experiment with a 5-gram or something like that?)
I expect that unless there is some kind of domain mismatch, we will not see much or any improvement. (Unless we try super-large LMs. I seem to remember Liyong had some experiment with a 5-gram or something like that?)
I think Liyong was using fast_beam_search + (L, or LG) in https://github.com/k2-fsa/icefall/pull/472
We have never tried to use a token-level G with modified beam search, I think.
I expect that unless there is some kind of domain mismatch, we will not see much or any improvement. (Unless we try super-large LMs. I seem to remember Liyong had some experiment with a 5-gram or something like that?)
I think Liyong was using fast_beam_search + (L, or LG) in #472
We have never tried to use a token-level G with modified beam search, I think.
My 2cts is that we need a very large LM (like 5gram). I will try it tomorrow and let you know
I expect that unless there is some kind of domain mismatch, we will not see much or any improvement. (Unless we try super-large LMs. I seem to remember Liyong had some experiment with a 5-gram or something like that?)
I think Liyong was using fast_beam_search + (L, or LG) in #472
We have never tried to use a token-level G with modified beam search, I think.
@glynpu Liyoug did try using a token-level G with beam search, he did not make a PR though, the results are in our weekly meeting notes (the 20th week), as the follows:
The results show that we can not get improvement from a pruned LM.
@glynpu Liyoug did try using a token-level G with beam search, he did not make a PR though, the results are in our weekly meeting notes (the 20th week), as the follows:
The results came from a word level LM. I was using kenlm at that time, here is the related code: https://github.com/glynpu/icefall/commit/3a9ff316f3601900fdff751bcc31636740c5b1a6
@csukuangfj Quick update : I am testing with a 5gram at the moment. I am getting test-clean : 2.68 test-other: 7.11
I am still doing some tests and do a more thorough review of the code.
Ngram : 5 Beam Size 4 :
ngram_lm_scale | test-clean | test-other |
---|---|---|
0 (baseline) | 2.73 | 7.15 |
0.01 | 2.74 | 7.15 |
0.1 | 2.68 | 7.11 |
0.2 | 2.68 | 7.14 |
Ngram : 5 Beam Size 8 :
ngram_lm_scale | test-clean | test-other |
---|---|---|
0 (baseline) | 2.72 | 7.15 |
0.01 | 2.71 | 7.14 |
0.1 | 2.71 | 7.11 |
0.2 | 2.68 | 7.06 |
0.3 | 2.74 | 7.28 |
@ezerhouni
Thanks! Are you using ./generate-lm.sh
to generate the 5-gram LM or are you using an LM trained on an external dataset?
@ezerhouni
Thanks! Are you using
./generate-lm.sh
to generate the 5-gram LM or are you using an LM trained on an external dataset?
I am using ./generate-lm.sh
. I am trying a 7gram to have an idea if it helps or not.
@csukuangfj I tried a 7gram and it seems to improve a bit (2.67/7.03) but I am not sure it is worth it
I think the main use-case of this is when there is a domain mismatch from the training corpus to the target domain. We can also try dividing the scores on the LM arcs by the corresonding scores given a low-order LM estimated on the training data.
@csukuangfj I tried a 7gram and it seems to improve a bit (2.67/7.03) but I am not sure it is worth it
Sorry for the late replay. I though I have replied last night.
I think 7gram is more than enough. Thanks for your experiments. The result shows that the code works with an n-gram LM, though we don't gain much from it. The next step is to use it to decode with a graph constructed from lists of specific words/phrases that we want to recognize.
@csukuangfj I tried a 7gram and it seems to improve a bit (2.67/7.03) but I am not sure it is worth it
Sorry for the late replay. I though I have replied last night.
I think 7gram is more than enough. Thanks for your experiments. The result shows that the code works with an n-gram LM, though we don't gain much from it. The next step is to use it to decode with a graph constructed from lists of specific words/phrases that we want to recognize.
I agree, I think 5gram is enough. I was thinking to use it for detecting OOV words. I will let you know once I have more results. (except if you have something in mind)
@csukuangfj I tried a 7gram and it seems to improve a bit (2.67/7.03) but I am not sure it is worth it
Sorry for the late replay. I though I have replied last night. I think 7gram is more than enough. Thanks for your experiments. The result shows that the code works with an n-gram LM, though we don't gain much from it. The next step is to use it to decode with a graph constructed from lists of specific words/phrases that we want to recognize.
I agree, I think 5gram is enough. I was thinking to use it for detecting OOV words. I will let you know once I have more results. (except if you have something in mind)
By the way, @marcoyang1998 is using the RNN-LM model that you provided for conformer CTC for shallow fusion and he can get a WER 2.46 for test-clean without being specifically tuned.
@csukuangfj I tried a 7gram and it seems to improve a bit (2.67/7.03) but I am not sure it is worth it
Sorry for the late replay. I though I have replied last night. I think 7gram is more than enough. Thanks for your experiments. The result shows that the code works with an n-gram LM, though we don't gain much from it. The next step is to use it to decode with a graph constructed from lists of specific words/phrases that we want to recognize.
I agree, I think 5gram is enough. I was thinking to use it for detecting OOV words. I will let you know once I have more results. (except if you have something in mind)
By the way, @marcoyang1998 is using the RNN-LM model that you provided for conformer CTC for shallow fusion and he can get a WER 2.46 for test-clean without being specifically tuned.
Sounds interesting ! If I am not mistaken, we can't add new word on the fly to an already trained RNN-LM isn't it ?
Sounds interesting ! If I am not mistaken, we can't add new word on the fly to an already trained RNN-LM isn't it ?
The RNN-LM is at token level, so as long as the new word can be represented by the bpe tokens, it can be rescored by the RNN-LM, I think.
The RNN-LM is at token level, so as long as the new word can be represented by the bpe tokens, it can be rescored by the RNN-LM, I think.
Indeed, but we can't "boost" specific words (or combination of specific tokens)
The RNN-LM is at token level, so as long as the new word can be represented by the bpe tokens, it can be rescored by the RNN-LM, I think.
Indeed, but we can't "boost" specific words (or combination of specific tokens)
Yes, you are right. That is why we are trying to integrate FST into decoding.
@csukuangfj I have a batch version (à la modified_beam_search), I took your commits and added mine on top of it (with a rebase), I will create a new PR if that's ok
@csukuangfj I have a batch version (à la modified_beam_search), I took your commits and added mine on top of it (with a rebase), I will create a new PR if that's ok
Yes, thanks! I will close this PR once you create a new PR.
See #630
We have been trying to use word-level G and LG for RNN-T decoding, but we have only tried this for fast_beam_search. However, using a word-level G or an LG cannot handle OOV words.
This PR tries to use a token-level G for shallow fusion with modified_beam_search. I am using OpenFst to manipulate the n-gram G on the CPU as it is easier to implement.