In my program I used faster-whisper to transcribe an audio file. The large-v2 model running in float16 took 10 minutes to process the Sam Altman audio file.
After implementing this library I got the following:
Large-v2 running on float16 with batch size of 50 = 54 seconds
Medium.en, float16, batch size of 75 = 32 seconds
Small.en, float16, batch size of 100 = 15 seconds!
Amazing!
Tests run on RTX 4090 with CUDA 12 and pytorch 2.2.0. Just thought you'd like to know.
Also, that's using the higher quality ASR parameters:
If you increase the batch size (regardless of size of the whisper model) where it exceeds available VRAM, the speeds dropped significantly, but this is expected behavior).
In my program I used
faster-whisper
to transcribe an audio file. Thelarge-v2
model running infloat16
took 10 minutes to process the Sam Altman audio file.After implementing this library I got the following:
Large-v2 running on float16 with batch size of 50 = 54 seconds Medium.en, float16, batch size of 75 = 32 seconds Small.en, float16, batch size of 100 = 15 seconds!
Amazing!
Tests run on RTX 4090 with CUDA 12 and pytorch 2.2.0. Just thought you'd like to know.
Also, that's using the higher quality ASR parameters:
If you increase the batch size (regardless of size of the whisper model) where it exceeds available VRAM, the speeds dropped significantly, but this is expected behavior).
https://github.com/BBC-Esq/ChromaDB-Plugin-for-LM-Studio/releases/tag/v4.0.0