jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 176 forks source link

Is there a way to specify the batch size to reduce VRAM? #398

Closed dimitrios-git closed 2 months ago

dimitrios-git commented 2 months ago

Running large models on large files require a lot of memory, but other libraries like whisper and whispex allow you to pass a batch_size to the CLI. Is there an equivalent option for stable-ts?

jianfch commented 2 months ago

Batch transcription is not supported on the original Whisper models (i.e. batch size is always 1) so there is no batch size parameter to control for reducing memory usage.

However, you can reduce memory usage on the Hugging Face models by specifying batch_size because the default, batch_size=24, uses significantly more memory than the original models. https://github.com/jianfch/stable-ts/blob/6d066308ed5a3328a69006d3a7d4496315736c0f/stable_whisper/whisper_word_level/hf_whisper.py#L186

The best way to reduce memory usage is to use a distilled or/and quantized large model from Faster-Whisper or Hugging Face.