Closed guynich closed 4 months ago
The above table is with --language "en"
in the short form bash scripts. By removing this flag and rerunning the evaluation the eval/wer
values are lower.
E.g.: | model | eval/wer with --language "en" |
eval/wer without option --language |
HF model card WER |
---|---|---|---|---|
OpenAI Large-v2 | 3.1683 | 2.5685 | 3.0004 | |
OpenAI Small | 4.0682 | 3.44541 | 3.4322 |
Without the --language
flag:
eval/wer
is lower than the HuggingFace model card WER value, and lower than the original OpenAI paper result of 2.7% in Table 2.eval/wer
is similar to the HuggingFace model card WER value.Added Tiny model script and result here: https://github.com/guynich/distil-whisper/tree/main/training/scripts#summary.
I'm closing this issue: the small and tiny model results for HF model card
and eval/wer without option --language
are aligned sufficiently for me.
(I don't understand the discrepancy in values for Large-V2 but can leave that issue)
Hi, I'm enjoying working with this fascinating repo.
Looking at Stage 4 short form evaluation, I modified the short form evaluation bash script for Librispeech clean dataset (test split) for OpenAI Large-v2 model here and Small model here.
The generated WER % results are higher than the HuggingFace model card evaluation WER results which is unexpected.
Any suggestions what might be causing these WER value differences (perhaps my short form eval bash scripts) ?