Closed asusdisciple closed 5 months ago
Hey @asusdisciple - what language were you using? It would be really helpful to have a reproducible end-to-end script I can use to get the same results that you're reporting
We use the script run_eval.py with the following launch command: https://github.com/huggingface/distil-whisper/blob/main/training/flax/evaluation_scripts/test/run_distilled.sh
If you execute this, you'll get the results quoted in the paper.
I am sorry, I just overread that distil-whisper is english only (even though it performs very well on a few other languages as well).
Hey @asusdisciple, no worries! If you're interested in a training Whisper on a different language, you can leverage the training code under distil-whisper/training. I recommend first setting up a baseline using these instructions: https://huggingface.co/sanchit-gandhi/distil-whisper-large-v3-de-kd#training-procedure
On which dataset did you exactly evaluate the model? I benchmarked this model on the original Fleurs dataset, along with all other implementations of whisper. It performed way worse with a WER% of 1.5, compared to ~0.46 in original whisper. Did I make an implementation error?
Here is how I initialize the model with
temp=0
,beams=1
,do_sample=True
:This how I call the transcribe: