I can't control device selection whether I want to run on GPU vs CPU.

furqan4545 commented 1 year ago

import whisper, stable_whisper model = stable_whisper.load_model('large-v2', device="cpu") # stable whisper ############# Hi, when I am passing cpu in device section my model is not runing on cpu. It automatically detects the gpu and shift it's inference on GPU. Can we control device as like we do in whisper. Your help will be highly appreciated.

The thing is, I am trying to run it on multiple GPUs so, once I can control selection of device then i will run each instance of stable-ts on a seperate GPU id.

here is the error that I am getting:

File "/home/azureuser/transpify_whisper_stable_v2/stable-ts/stable_whisper/whisper_word_level.py", line 725, in transcribe_minimal
    return transcribe_any(
  File "/home/azureuser/transpify_whisper_stable_v2/stable-ts/stable_whisper/non_whisper.py", line 317, in transcribe_any
    result = inference_func(**inference_kwargs)
  File "/home/azureuser/myenv2/lib/python3.8/site-packages/whisper/transcribe.py", line 316, in transcribe
    add_word_timestamps(
  File "/home/azureuser/myenv2/lib/python3.8/site-packages/whisper/timing.py", line 303, in add_word_timestamps
    alignment = find_alignment(model, tokenizer, text_tokens, mel, num_frames, **kwargs)
  File "/home/azureuser/myenv2/lib/python3.8/site-packages/whisper/timing.py", line 214, in find_alignment
    text_indices, time_indices = dtw(-matrix)
  File "/home/azureuser/myenv2/lib/python3.8/site-packages/whisper/timing.py", line 151, in dtw
    return dtw_cpu(x.double().cpu().numpy())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

jianfch commented 1 year ago

Do get the same error with just Whisper? If you do, then it is an PyTorch Issue. If not, you can try load the Whisper model then modify it:

model = whisper.load_model('large-v2', device="cpu")
stable_whisper.modify_model(model)

furqan4545 commented 1 year ago

I tried that but it seems like there is a memory leakage problem. I get this error usually when the model fails to reset the memory and i perform the inference again so at that point it start throwing the above error I showed in first msg. Also there is one more issue i saw which is : for german language somehow it is missing word level time stamp... do you have any advise for that too?

jianfch commented 1 year ago

What were the settings you used for transcribe_minimal()? Which type was audio (str/np.ndarray/torch.Tensor/bytes)?

furqan4545 commented 1 year ago

actually transcribe_minimal() is working ok for now i guess. In model.transcribe() i am facing one issue which is: when i load the model on cuda:0... inference and everything works fine. but when i load the model on gpu:1 or any other gpu other than 0.. it start throwing the error which i posted above. This is strange. let me show you my parameters.

result = model.transcribe(video_file, mel_first=True, language=lang, suppress_silence=False, ts_num=5,time_scale=1.0,temperature=(0.3,0.4,0.5), verbose=True)

this is what i am using.

jianfch commented 1 year ago

I managed to replicate this error when using device='cuda:1', but works fine when I call cuda(1) on the loaded model.

model = stable_whisper.load_model('base').cuda(1)

furqan4545 commented 1 year ago

It worked, thanks a lot man.

furqan4545 commented 1 year ago

It worked

jianfch / stable-ts

I can't control device selection whether I want to run on GPU vs CPU. #203