Open madroidmaq opened 7 months ago
You will need to train a new distilled model for it to work with v3. The current one won't work out of the box.
Would be cool to start a new distillation run for Whisper-large-v3 indeed! Let's see if we find some compute
Would be nice to know approximately how much compute/time is required for training an X size model on Y amount of data. For ex, is a 4090 enough to distil? If so, approximately how long? (Assuming no bottlenecks)
We mainly trained on TPUv4's here. @sanchit-gandhi will know best what hardware is needed I believe :-)
The large-v2 of whisper has a further drop in WER than the large-v3 version, and it is hoped that a corresponding large-v3 version will be available.