huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.33k stars 238 forks source link

Smaller models? #35

Closed regularfry closed 6 months ago

regularfry commented 7 months ago

Unless I've missed something, it's not clear whether the same technique works to accelerate the small.en and smaller whisper models. Is that something you've looked at? If not, would there be any mileage in training it up?

small.en in particular is interesting because it's the biggest that fits onto a raspberry pi zero 2, but isn't quite fast enough for realtime use. Speeding it up would be transformative.

sanchit-gandhi commented 7 months ago

See #14 for a discussion on distilling smaller models. The technique should indeed work, and we are in the process of distilling small.en!

regularfry commented 7 months ago

Excellent news, thank you.

sanchit-gandhi commented 7 months ago

Training code released under this folder in case you want to try yourself! https://github.com/huggingface/distil-whisper/tree/main/training

sanchit-gandhi commented 6 months ago

Here you go! https://huggingface.co/distil-whisper/distil-small.en

regularfry commented 6 months ago

Thanks! I'll open a separate issue, but it's considerably slower than the original small.en on my M1 under whisper.cpp. That's deeply unintuitive to me, and it's not clear to me where the problem might be.