m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
10.66k stars 1.14k forks source link

detect_language() doesn't work on a GPU? #782

Open utility-aagrawal opened 4 months ago

utility-aagrawal commented 4 months ago

Here's a simple script to identify language from an audio:

import whisperx import time

start_time = time.time() filepath = ""

whisper_model = whisperx.load_model("medium", device = "cuda", compute_type="float16")

audio = whisperx.load_audio(filepath) audio = whisperx.audio.pad_or_trim(audio)

print(f"Language => {whisper_model.detect_language(audio)}")

end_time = time.time() print(f"Time taken: {end_time - start_time:.2f} seconds")

You can use this file as an input:

https://github.com/m-bain/whisperX/assets/140737044/c912bca2-3b10-4304-846d-4529decacd59

I am getting this error:

Could not load library libcudnn_cnn_infer.so.8. Error: libcudnn_cnn_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_cnn_infer.so.8 is in your library path! Aborted (core dumped)

Can someone tell me why? Let me know if you need anythind additional. Thanks!

utility-aagrawal commented 4 months ago

What I don't understand is how transcribe() works even if I keep everything else in the code unchanged.

This code works:

import whisperx import time

start_time = time.time() filepath = ""

whisper_model = whisperx.load_model("medium", device = "cuda", compute_type="float16")

audio = whisperx.load_audio(filepath) audio = whisperx.audio.pad_or_trim(audio)

results = whisper_model.transcribe(audio) print(results)

end_time = time.time() print(f"Time taken: {end_time - start_time:.2f} seconds")

transcribe() also uses the same detect_language() method as you can see here: https://github.com/m-bain/whisperX/blob/f2da2f858e99e4211fe4f64b5f2938b007827e17/whisperx/asr.py#L194

but this doesn't throw the same error.

candrascik commented 3 months ago

Adding the following to my Dockerfile fixed this issue. Make sure nvidia-cudnn-cu12 is <9.

RUN pip install nvidia-cudnn-cu12==8.9.7.29 ENV LD_LIBRARY_PATH /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH