Might this be something interesting for this project too?
On another thought, the less VRAM we have, the less "batch size" we need to set. Whisper original models were trained with batch size 1024 except the large with 256 if i remember correctly. Would the batch size potentially influence the final result or is it "just" a matter of speed?
Dears, thank you for open-sourcing this, it will be very helpful in future for me!
One question tough, i wonder if it will be possible to train the large model with only 24GB Vram which is available in consumer cards: https://huggingface.co/openai/whisper-large-v2/discussions/21
They link to this deepspeed technology: https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#deepspeed
Might this be something interesting for this project too?
On another thought, the less VRAM we have, the less "batch size" we need to set. Whisper original models were trained with batch size 1024 except the large with 256 if i remember correctly. Would the batch size potentially influence the final result or is it "just" a matter of speed?