m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.39k stars 1.3k forks source link

ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation. #878

Open kc01-8 opened 2 months ago

kc01-8 commented 2 months ago
PS F:\whisperX-main> whisperx audio.mp4 --model large-v2 --diarize --highlight_words True --min_speakers 5 --max_speakers 5 --hf_token hf_x
C:\Users\kc01\AppData\Roaming\Python\Python310\site-packages\pyannote\audio\core\io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\kc01\AppData\Roaming\Python\Python310\Scripts\whisperx.exe\__main__.py", line 7, in <module>
  File "C:\Users\kc01\AppData\Roaming\Python\Python310\site-packages\whisperx\transcribe.py", line 170, in cli
    model = load_model(model_name, device=device, device_index=device_index, download_root=model_dir, compute_type=compute_type, language=args['language'], asr_options=asr_options, vad_options={"vad_onset": vad_onset, "vad_offset": vad_offset}, task=task, threads=faster_whisper_threads)
  File "C:\Users\kc01\AppData\Roaming\Python\Python310\site-packages\whisperx\asr.py", line 288, in load_model
    model = model or WhisperModel(whisper_arch,
  File "C:\Users\kc01\AppData\Roaming\Python\Python310\site-packages\faster_whisper\transcribe.py", line 133, in __init__
    self.model = ctranslate2.models.Whisper(
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.

Happens using a 3080ti, which works flawlessly with NVidia NeMo. Completely fresh install of whisperx.

Hasan-Naseer commented 2 months ago
PS F:\whisperX-main> whisperx audio.mp4 --model large-v2 --diarize --highlight_words True --min_speakers 5 --max_speakers 5 --hf_token hf_x
C:\Users\kc01\AppData\Roaming\Python\Python310\site-packages\pyannote\audio\core\io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\kc01\AppData\Roaming\Python\Python310\Scripts\whisperx.exe\__main__.py", line 7, in <module>
  File "C:\Users\kc01\AppData\Roaming\Python\Python310\site-packages\whisperx\transcribe.py", line 170, in cli
    model = load_model(model_name, device=device, device_index=device_index, download_root=model_dir, compute_type=compute_type, language=args['language'], asr_options=asr_options, vad_options={"vad_onset": vad_onset, "vad_offset": vad_offset}, task=task, threads=faster_whisper_threads)
  File "C:\Users\kc01\AppData\Roaming\Python\Python310\site-packages\whisperx\asr.py", line 288, in load_model
    model = model or WhisperModel(whisper_arch,
  File "C:\Users\kc01\AppData\Roaming\Python\Python310\site-packages\faster_whisper\transcribe.py", line 133, in __init__
    self.model = ctranslate2.models.Whisper(
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.

Happens using a 3080ti, which works flawlessly with NVidia NeMo. Completely fresh install of whisperx.

What is the device you are passing? Are you sure it's 'GPU' and not 'CPU'. If I recall correctly this was a CPU only problem not with whisperx but faster-whisper under the hood. See for example this issue here https://github.com/SYSTRAN/faster-whisper/issues/65

If you are indeed sending the correct params for GPU use then I recommend running faster-whisper directly first to narrow down the problem. Make a .py file import the necessary starter code, you find it on faster-whisper's github and run with the verbose flag set

CT2_VERBOSE=1 time python3 main.py

should give more console output for debugging.

We can proceed from there to see what's wrong.