CUDA failed with error out of memory

zallesov commented 1 year ago

Hello WhisperX developers. Thanks for open-sourcing this code.

Since another similar question was closed without an answer I will repeat it.

I'm getting an error CUDA failed with error out of memory at the diarization step. it happens on just some files. Which is not too long. Just 11+ minutes.

The setups I've tried were. GPU: nvidia-tesla-t4 and nvidia-tesla-l4 GPU count: 1 and two GPU mem: 16GB per GPU Batch size: 32, 24, 20, and 16* Model: large-v2** Compute types: int8 and float16

*with batch size 16 transcription takes over a minute and timesout. We run it on Googles VertexAI which has a hard limit on prediction duration ** Other models oftentimes produce weird results replacing multiple words with the same token. Like 7 7 7 7 7... or with with with....... Plain Whisper is also subject to the same issue. Reducing the batch size and using smaller model would have been a solution but considering these limitation we can not go this path.

I've also tried allocating the allignment and diarization models on the CPU. But that had no effect. I've tried adding garbage collection torch.cuda.empty_cache(). But that also did not help.

Please share any ideas of further improvements I should try. Thank you.

Ntweat commented 1 year ago

Same issue. del model and gc.collect() doesnt completely free up the GPU. I am looking into it.

simonkuang commented 10 months ago

same issue +1

kurianbenoy-sentient commented 8 months ago

I have also noticed GPU memory not getting freed after each inference. Is there any way to clear GPU memory efficiently after each inference run?

Ntweat commented 8 months ago

The way I found to solve the issue is to delete whisperx and reimport it on every file. PFA the code

def whisperx_trans(audio_file):
    import whisperx

    model = whisperx.load_model("large-v2", "cuda", asr_options={"suppress_tokens":[-1]+number_tokens})
    audio = whisperx.load_audio(audio_file)
    result = model.transcribe(audio, batch_size=batch_size)
    print(result["segments"]) # before alignment
    model_a, metadata = whisperx.load_align_model(language_code="en", device=device)
    result2 = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)

    print(result["segments"]) # after alignment
    diarize_model = whisperx.DiarizationPipeline(use_auth_token="<API_Key>", device=device)

    diarize_segments = diarize_model(audio_file)

    result4 = whisperx.assign_word_speakers(diarize_segments, result2)
    print(diarize_segments)
    print(result["segments"]) #  segments are now assigned speaker IDs

    print(diarize_segments.head())
    print(diarize_segments.columns)
    del model
    del model_a
    del diarize_model
    del whisperx
    import torch 
    torch.cuda.empty_cache()
    del torch
    gc.collect()
    return result4, diarize_segments

tomwagstaff-opml commented 7 months ago

Same issue, except it kicks in directly at transcription and affects all my production files. My error message:

Traceback (most recent call last):

  Cell In[33], line 16
    model = whisperx.load_model(my_model, device, compute_type = compute_type, language = my_language)

  File C:\ProgramData\miniconda3\envs\transcription\lib\site-packages\whisperx\asr.py:288 in load_model
    model = model or WhisperModel(whisper_arch,

  File C:\ProgramData\miniconda3\envs\transcription\lib\site-packages\faster_whisper\transcribe.py:130 in __init__
    self.model = ctranslate2.models.Whisper(

RuntimeError: CUDA failed with error out of memory

This happens even with a batch size of 2 (!)

Has anyone found a solution?

gamingflexer commented 4 months ago

Actually it was working fine for same time duration of files for a batch of file, After few hours it started giving this issue

m-bain / whisperX

CUDA failed with error out of memory #388