m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.76k stars 1.24k forks source link

ValueError: Unknown backend. #526

Open DogeLord081 opened 11 months ago

DogeLord081 commented 11 months ago

Code:

import whisperx
import gc 

device = "cuda" 
audio_file = r"out.wav"
batch_size = 16 # reduce if low on GPU mem
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)

# 1. Transcribe with original whisper (batched)
model = whisperx.load_model("large-v2", device, compute_type=compute_type)

audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"]) # before alignment

Error:

WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
C:\Users\danu0\AppData\Local\Programs\Python\Python310\lib\site-packages\torchaudio\backend\utils.py:48: UserWarning: set_audio_backend is a no-op when the I/O backend dispatcher is enabled.
  warnings.warn("set_audio_backend is a no-op when the I/O backend dispatcher is enabled.")
Traceback (most recent call last):
  File "c:\Users\danu0\Downloads\OneReality\test4.py", line 1, in <module>
    import whisperx
  File "C:\Users\danu0\AppData\Local\Programs\Python\Python310\lib\site-packages\whisperx\__init__.py", line 1, in <module>
    from .transcribe import load_model
  File "C:\Users\danu0\AppData\Local\Programs\Python\Python310\lib\site-packages\whisperx\transcribe.py", line 10, in <module>
    from .asr import load_model
  File "C:\Users\danu0\AppData\Local\Programs\Python\Python310\lib\site-packages\whisperx\asr.py", line 13, in <module>
    from .vad import load_vad_model, merge_chunks
  File "C:\Users\danu0\AppData\Local\Programs\Python\Python310\lib\site-packages\whisperx\vad.py", line 11, in <module>
    from pyannote.audio.pipelines import VoiceActivityDetection
  File "C:\Users\danu0\AppData\Local\Programs\Python\Python310\lib\site-packages\pyannote\audio\pipelines\__init__.py", line 26, in <module>
    from .speaker_diarization import SpeakerDiarization
  File "C:\Users\danu0\AppData\Local\Programs\Python\Python310\lib\site-packages\pyannote\audio\pipelines\speaker_diarization.py", line 40, in <module>
    from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
  File "C:\Users\danu0\AppData\Local\Programs\Python\Python310\lib\site-packages\pyannote\audio\pipelines\speaker_verification.py", line 43, in <module>
    backend = torchaudio.get_audio_backend()
  File "C:\Users\danu0\AppData\Local\Programs\Python\Python310\lib\site-packages\torchaudio\backend\utils.py", line 93, in get_audio_backend
    raise ValueError("Unknown backend.")
ValueError: Unknown backend.
sorgfresser commented 11 months ago

That one is caused by torchaudio not detecting a suitable backend on windows. Maybe try to raise the issue over there (https://github.com/pytorch/audio)?