huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.32k stars 238 forks source link

Finetuning on which model? #104

Open RohitMidha23 opened 3 months ago

RohitMidha23 commented 3 months ago

As you mentioned, we should fine tune when the WER > 20% and dataset size < 1000 hours.

This is my case as well, where I have a finetuned model with WER = 48% and dataset size = 100 hours.

My question is, do you fine tune the model created through create_student_model or are we better of finetuning a tiny / small model?

Thanks for your time @sanchit-gandhi!

afsara-ben commented 2 months ago

good question! i also want to know -- from a preliminary experiment i find that using create_student_model to finetune can generate only empty transcripts, my finetuning data was <1hr. but finetuning the tiny/small with <1hr data yielded better result.