huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.33k stars 238 forks source link

Hopefully the large-v3 version will be supported #23

Open madroidmaq opened 7 months ago

madroidmaq commented 7 months ago

The large-v2 of whisper has a further drop in WER than the large-v3 version, and it is hoped that a corresponding large-v3 version will be available.

image

vvvm23 commented 7 months ago

You will need to train a new distilled model for it to work with v3. The current one won't work out of the box.

patrickvonplaten commented 7 months ago

Would be cool to start a new distillation run for Whisper-large-v3 indeed! Let's see if we find some compute

AnkushMalaker commented 7 months ago

Would be nice to know approximately how much compute/time is required for training an X size model on Y amount of data. For ex, is a 4090 enough to distil? If so, approximately how long? (Assuming no bottlenecks)

patrickvonplaten commented 7 months ago

We mainly trained on TPUv4's here. @sanchit-gandhi will know best what hardware is needed I believe :-)