k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
912 stars 293 forks source link

In wenetspeech recipe, fast_beam_search_LG almost always get worse WER result than greedy search! #1211

Open zhangzhengyireal opened 1 year ago

zhangzhengyireal commented 1 year ago

Collecting environment information... k2 version: 1.24.3 Build type: Release Git SHA1: 42e92fdd4097adcfe9937b4d2df7736d227b8e85 Git date: Wed Jun 28 09:50:36 2023 Cuda used to build k2: 11.6 cuDNN used to build k2: 8.2.0 Python version used to build k2: 3.9 OS used to build k2: Ubuntu 20.04.6 LTS CMake version: 3.26.4 GCC version: 7.5.0 PyTorch version used to build k2: 1.13.1+cu116 PyTorch is using Cuda: 11.6 NVTX enabled: True With CUDA: True Disable debug: True Sync kernels : False Disable checks: False Max cpu memory allocate: 214748364800 bytes (or 200.0 GB) k2 abort: False

Resource: https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615 Testset: wenetspeech/ DEV

Bash command: exp_dir=download/huggingface/icefall-asr-zipformer-streaming-wenetspeech-20230615/exp lang_dir=download/huggingface/icefall-asr-zipformer-streaming-wenetspeech-20230615/data/lang_char decode_method=greedy_search #decode_method=fast_beam_search_LG ./zipformer/decode.py \ --epoch ${ep} \ --avg ${avg} \ --exp-dir ${exp_dir}/ \ --lang-dir ${lang_dir} \ --max-duration 800 \ --decoding-method ${decode_method} \ --blank-penalty ${blank_penalty} \ --ngram-lm-scale ${nls} \ --ilme-scale ${ilme_scale} \ --manifest-dir data/fbank/ \ --causal 1 \ --chunk-size ${chunk_size} \ --left-context-frames ${left_context}

Result: FXwkpuzV47

In both chunk=16 and chunk=32, I can't get better WER by fast_beam_search_LG.

danpovey commented 1 year ago

Have you tried the LODR method? Also, assuming your LG is based on Chinese words, what is the vocabulary coverage of your dev set like?

desh2608 commented 1 year ago

In my experiments, I have always found the "nbest" variations to be better than the one best versions, e.g., fast_beam_search_nbest_LG better than fast_beam_search_LG.

Usually, you would also need to play around with the --beam parameter to balance out insertions vs. deletions. It looks like you have significantly higher deletions at the moment, maybe you can try increasing the beam.