Open utility-aagrawal opened 7 months ago
What I don't understand is how transcribe() works even if I keep everything else in the code unchanged.
This code works:
import whisperx import time
start_time = time.time() filepath = ""
whisper_model = whisperx.load_model("medium", device = "cuda", compute_type="float16")
audio = whisperx.load_audio(filepath) audio = whisperx.audio.pad_or_trim(audio)
results = whisper_model.transcribe(audio) print(results)
end_time = time.time() print(f"Time taken: {end_time - start_time:.2f} seconds")
transcribe() also uses the same detect_language() method as you can see here: https://github.com/m-bain/whisperX/blob/f2da2f858e99e4211fe4f64b5f2938b007827e17/whisperx/asr.py#L194
but this doesn't throw the same error.
Adding the following to my Dockerfile fixed this issue. Make sure nvidia-cudnn-cu12
is <9.
RUN pip install nvidia-cudnn-cu12==8.9.7.29
ENV LD_LIBRARY_PATH /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib:$LD_LIBRARY_PATH
Here's a simple script to identify language from an audio:
import whisperx import time
start_time = time.time() filepath = ""
whisper_model = whisperx.load_model("medium", device = "cuda", compute_type="float16")
audio = whisperx.load_audio(filepath) audio = whisperx.audio.pad_or_trim(audio)
print(f"Language => {whisper_model.detect_language(audio)}")
end_time = time.time() print(f"Time taken: {end_time - start_time:.2f} seconds")
You can use this file as an input:
https://github.com/m-bain/whisperX/assets/140737044/c912bca2-3b10-4304-846d-4529decacd59
I am getting this error:
Could not load library libcudnn_cnn_infer.so.8. Error: libcudnn_cnn_infer.so.8: cannot open shared object file: No such file or directory Please make sure libcudnn_cnn_infer.so.8 is in your library path! Aborted (core dumped)
Can someone tell me why? Let me know if you need anythind additional. Thanks!