k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
474 stars 97 forks source link

High Word Error Rates % in Large Whisper V3 #552

Open zhao-lun opened 3 months ago

zhao-lun commented 3 months ago

model_used=large_whisperv3 Hardware: A100 Dataset: aishell

client cmd

num_task=16
python3 client.py     --server-addr localhost   \
  --model-name whisper     --num-tasks $num_task   \
 --whisper-prompt "<|startoftranscript|><|zh|><|transcribe|><|notimestamps|>"    \
 --manifest-dir /sample_dataset/aishell1_test/ 

server is built using the sample dockerfile

output: RTF: 0.0092 total_duration: 32590.000 seconds (9.05 hours) processing time: 299.156 seconds (0.08 hours)

%WER = 53.34 Errors: 55 insertions, 0 deletions, 3773 substitutions, over 7176 reference words (3403 correct)

Hi, i followed setup instructions, and it ran without any issues. However, I noticed an abnormal WER percentage. Is this normal?

yuekaizhang commented 3 months ago

@zhao-lun https://github.com/k2-fsa/icefall/blob/master/egs/aishell/ASR/whisper/decode.py#L286-L288, check this to do normalize before computing metrics.

If you have some free time, feel free to make a PR to triton-asr-client/client.py.