k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
515 stars 103 forks source link

decoded text not similar #406

Open laishramrahul opened 1 year ago

laishramrahul commented 1 year ago

I have built models based on conformer-ctc librispeech. I am comparing the decoded text of the test set using sherpa offline_ctc_asr. The decoded text are not exactly similar for the same file. I want to get the exact same decoded text, please help.

pkufool commented 1 year ago

Do you use the same decoding method? Does this happen for all files (i.e. the WERs of a bunch of files are worse) or just for one wav?

laishramrahul commented 1 year ago

The current model is trained for 0-19 epoch.

The test files are decoded using "./conformer_ctc/decode.py --epoch 19 --avg 1 --exp-dir conformer_ctc/exp".

The 19th epoch model is exported with "python conformer_ctc/export.py --epoch 19 --avg 1 --exp-dir conformer_ctc/exp --lang-dir data/lang_bpe_500 --jit 1" to be used with sherpa

The exported model is used with "./sherpa/bin/offline_ctc_asr.py --nn-model conformer_ctc/exp/cpu_jit.pt --tokens data/lang_bpe_500/tokens.txt --use-gpu false --HLG data/lang_bpe_500/HLG.pt --lm-scale 5.0 audio_files/1000000194.wav", I have checked with different values of --lm-scale on few different files but the decoded text given by decode.py and offline_ctc_asr.py are not same.

pkufool commented 1 year ago

Could you post the decoding logs of ./conformer_ctc/decode.py --epoch 19 --avg 1 --exp-dir conformer_ctc/exp and ./sherpa/bin/offline_ctc_asr.py --nn-model conformer_ctc/exp/cpu_jit.pt --tokens data/lang_bpe_500/tokens.txt --use-gpu false --HLG data/lang_bpe_500/HLG.pt --lm-scale 5.0 audio_files/1000000194.wav so we can compare the decoding configuration.