Open zhangzhengyireal opened 1 year ago
Have you tried the LODR method? Also, assuming your LG is based on Chinese words, what is the vocabulary coverage of your dev set like?
In my experiments, I have always found the "nbest" variations to be better than the one best versions, e.g., fast_beam_search_nbest_LG
better than fast_beam_search_LG
.
Usually, you would also need to play around with the --beam
parameter to balance out insertions vs. deletions. It looks like you have significantly higher deletions at the moment, maybe you can try increasing the beam.
Collecting environment information... k2 version: 1.24.3 Build type: Release Git SHA1: 42e92fdd4097adcfe9937b4d2df7736d227b8e85 Git date: Wed Jun 28 09:50:36 2023 Cuda used to build k2: 11.6 cuDNN used to build k2: 8.2.0 Python version used to build k2: 3.9 OS used to build k2: Ubuntu 20.04.6 LTS CMake version: 3.26.4 GCC version: 7.5.0 PyTorch version used to build k2: 1.13.1+cu116 PyTorch is using Cuda: 11.6 NVTX enabled: True With CUDA: True Disable debug: True Sync kernels : False Disable checks: False Max cpu memory allocate: 214748364800 bytes (or 200.0 GB) k2 abort: False
Resource: https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615 Testset: wenetspeech/ DEV
Bash command: exp_dir=download/huggingface/icefall-asr-zipformer-streaming-wenetspeech-20230615/exp lang_dir=download/huggingface/icefall-asr-zipformer-streaming-wenetspeech-20230615/data/lang_char decode_method=greedy_search #decode_method=fast_beam_search_LG ./zipformer/decode.py \ --epoch ${ep} \ --avg ${avg} \ --exp-dir ${exp_dir}/ \ --lang-dir ${lang_dir} \ --max-duration 800 \ --decoding-method ${decode_method} \ --blank-penalty ${blank_penalty} \ --ngram-lm-scale ${nls} \ --ilme-scale ${ilme_scale} \ --manifest-dir data/fbank/ \ --causal 1 \ --chunk-size ${chunk_size} \ --left-context-frames ${left_context}
Result:
In both chunk=16 and chunk=32, I can't get better WER by fast_beam_search_LG.