Open zallesov opened 1 year ago
Same issue. del model and gc.collect() doesnt completely free up the GPU. I am looking into it.
same issue +1
I have also noticed GPU memory not getting freed after each inference. Is there any way to clear GPU memory efficiently after each inference run?
The way I found to solve the issue is to delete whisperx and reimport it on every file. PFA the code
def whisperx_trans(audio_file):
import whisperx
model = whisperx.load_model("large-v2", "cuda", asr_options={"suppress_tokens":[-1]+number_tokens})
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"]) # before alignment
model_a, metadata = whisperx.load_align_model(language_code="en", device=device)
result2 = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
print(result["segments"]) # after alignment
diarize_model = whisperx.DiarizationPipeline(use_auth_token="<API_Key>", device=device)
diarize_segments = diarize_model(audio_file)
result4 = whisperx.assign_word_speakers(diarize_segments, result2)
print(diarize_segments)
print(result["segments"]) # segments are now assigned speaker IDs
print(diarize_segments.head())
print(diarize_segments.columns)
del model
del model_a
del diarize_model
del whisperx
import torch
torch.cuda.empty_cache()
del torch
gc.collect()
return result4, diarize_segments
Same issue, except it kicks in directly at transcription and affects all my production files. My error message:
Traceback (most recent call last):
Cell In[33], line 16
model = whisperx.load_model(my_model, device, compute_type = compute_type, language = my_language)
File C:\ProgramData\miniconda3\envs\transcription\lib\site-packages\whisperx\asr.py:288 in load_model
model = model or WhisperModel(whisper_arch,
File C:\ProgramData\miniconda3\envs\transcription\lib\site-packages\faster_whisper\transcribe.py:130 in __init__
self.model = ctranslate2.models.Whisper(
RuntimeError: CUDA failed with error out of memory
This happens even with a batch size of 2 (!)
Has anyone found a solution?
Actually it was working fine for same time duration of files for a batch of file, After few hours it started giving this issue
Hello WhisperX developers. Thanks for open-sourcing this code.
Since another similar question was closed without an answer I will repeat it.
I'm getting an error
CUDA failed with error out of memory
at the diarization step. it happens on just some files. Which is not too long. Just 11+ minutes.The setups I've tried were. GPU: nvidia-tesla-t4 and nvidia-tesla-l4 GPU count: 1 and two GPU mem: 16GB per GPU Batch size: 32, 24, 20, and 16* Model: large-v2** Compute types: int8 and float16
*with batch size 16 transcription takes over a minute and timesout. We run it on Googles VertexAI which has a hard limit on prediction duration ** Other models oftentimes produce weird results replacing multiple words with the same token. Like
7 7 7 7 7...
orwith with with......
. Plain Whisper is also subject to the same issue. Reducing the batch size and using smaller model would have been a solution but considering these limitation we can not go this path.I've also tried allocating the allignment and diarization models on the CPU. But that had no effect. I've tried adding garbage collection
torch.cuda.empty_cache()
. But that also did not help.Please share any ideas of further improvements I should try. Thank you.