Open bhaswa opened 11 months ago
I am afraid you cannot. We have implemented only greedy_search and modified_beam_search for transducer models.
fast_beam_search
requires k2 but sherpa-onnx does not depend on k2.
In case LM is used, LODR, Rescoring or shallow fusion also cannot be used in sherpa onnx ?
No, you can use RNN lm rescoring with sherpa-onnx.
Please search for the PR for rnnlm rescoring in sherpa-onnx. There are usages in the comments of that PR.
So by default, if I use --lm and --decoding-method=modified_beam_search, it will be lm rescoring ?
You need to pass the rnnlm model
Yes. rnnlm need to be provided
https://github.com/k2-fsa/sherpa-onnx/pull/353
From the above pull request, it seems that shallow fusion is also implemented. Can you provide the usage for the same ?
https://github.com/k2-fsa/sherpa-onnx/pull/147
Please search for shallow fusion in the related PR. You can find usages in the comments.
https://github.com/k2-fsa/sherpa-onnx/pull/125 From the above PR I found the usage for LM rescore as below:
./build/bin/sherpa-onnx-offline \ --tokens=./sherpa-onnx-zipformer-en-2023-04-01/tokens.txt \ --encoder=./sherpa-onnx-zipformer-en-2023-04-01/encoder-epoch-99-avg-1.onnx \ --decoder=./sherpa-onnx-zipformer-en-2023-04-01/decoder-epoch-99-avg-1.onnx \ --joiner=./sherpa-onnx-zipformer-en-2023-04-01/joiner-epoch-99-avg-1.onnx \ --lm-scale=0.5 \ --num-threads=2 \ --decoding-method=modified_beam_search \ --max-active-paths=4 \ ./2414-159411-0024.wav \
https://github.com/k2-fsa/sherpa-onnx/pull/147 From this PR I found the usage of shallow fusion as below: ./bin/sherpa-onnx exp/data/lang_char_bpe/tokens.txt exp/exp/encoder-epoch-99-avg-1.onnx exp/exp/decoder-epoch-99-avg-1.onnx exp/exp/joiner-epoch-99-avg-1.onnx exp/test_wavs/BAC009S0764W0164.wav 2 modified_beam_search exp/exp/with-state-epoch-999-avg-1.onnx
From the above two commands, I found difference only in the executable. I could not find any difference in the arguments passed to differentiate between rescoring or shallow fusion.
If I want to run the python API, how can I differentiate between rescoring and shallow fusion ?
From the above PR I found the usage for LM rescore as below:
Please take a look at the usage in the PR comment. You have found the wrong place in the PR.
My bad. I copied the wrong segment.
But still I cannot find any difference in the arguments from https://github.com/k2-fsa/sherpa-onnx/pull/125 (LM rescore) and https://github.com/k2-fsa/sherpa-onnx/pull/147 (shallow fusion)
I want to run the python API. How can I differentiate between rescoring and shallow fusion ?
between rescoring and shallow fusion
Could you explain the difference between rescoring and shallow fusion?
In Icefall, we can use LM with rescoring and shallow fusion.
The command for shallow fusion is ./pruned_transducer_stateless7_streaming/decode.py \ --epoch 99 \ --avg 1 \ --use-averaged-model False \ --beam-size 4 \ --exp-dir $exp_dir \ --max-duration 600 \ --decode-chunk-len 32 \ --decoding-method modified_beam_search_lm_shallow_fusion \ --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model \ --use-shallow-fusion 1 \ --lm-type rnn \ --lm-exp-dir $lm_dir \ --lm-epoch 99 \ --lm-scale $lm_scale \ --lm-avg 1 \ --rnn-lm-embedding-dim 2048 \ --rnn-lm-hidden-dim 2048 \ --rnn-lm-num-layers 3 \ --lm-vocab-size 500
The command for rescoring is: ./pruned_transducer_stateless7_streaming/decode.py \ --epoch 99 \ --avg 1 \ --use-averaged-model False \ --beam-size 4 \ --exp-dir $exp_dir \ --max-duration 600 \ --decode-chunk-len 32 \ --decoding-method modified_beam_search_lm_rescore \ --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model \ --use-shallow-fusion 0 \ --lm-type rnn \ --lm-exp-dir $lm_dir \ --lm-epoch 99 \ --lm-scale $lm_scale \ --lm-avg 1 \ --rnn-lm-embedding-dim 2048 \ --rnn-lm-hidden-dim 2048 \ --rnn-lm-num-layers 3 \ --lm-vocab-size 500
In sherpa onnx, how can I use LM with these two different settings? Also with the given command in sherpa onnx pull requests (https://github.com/k2-fsa/sherpa-onnx/pull/125 and https://github.com/k2-fsa/sherpa-onnx/pull/147), LM will run with rescoring or shallow fusion?
Hi,
In icefall, there are multiple decoding methods available, eg. greedy_search, beam_search, modified_beam_search, fast_beam_search, fast_beam_search_nbest. There are some other decoding methods for LM as well (modified_beam_search_lm_shallow_fusion, modified_beam_search_LODR, modified_beam_search_lm_rescore, modified_beam_search_lm_rescore_LODR). But in sherpa onnx, there are only two valid decoded methods (greedy_search and modified_beam_search) Can we use the other decoding methods same as icefall in sherpa onnx as well ?