Open maxlund opened 2 hours ago
Including a minimal example to reproduce here:
import stable_whisper
import torch
model_path = "/path/to/large-v3-turbo.pt"
audio_paths = [
"/path/to/mozart-of-gen-z-interview.mp3",
"/path/to/long-audio.mp3"
]
model = stable_whisper.load_model(model_path, device=torch.device('cuda'))
segments_and_start_times = list()
for audio_path in audio_paths:
whisper_result = model.transcribe(audio=audio_path, vad=True, language="english", verbose=False)
for res in whisper_result:
segments_and_start_times.append([res.start, res.text, res.end])
print(segments_and_start_times)
Hi,
First off, thank you for this great implementation, really good stuff!
When using the newest stable-ts version on Windows to run the
large-v3-turbo
model I think there might be a memory leak of some sort when transcribing longer (1h+) audio, the RAM (not VRAM) usage goes way up:RAM usage seems to be steadily increasing until we eventually get an OOM error:
I uploaded the audio file (runtime 02:27:47) which caused the error above here
We also have a very long audio file uploaded here (10h+ long, mostly silence), which you could perhaps use if the file above does not reproduce the issue.
We have been using your library for a while, and didn't observe any of these issues prior to switching over to the
large-v3-turbo
model and using the latest version of the library. Any ideas?Thanks again for all your fantastic work here!