m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.59k stars 1.33k forks source link

CuFFT issue with Cuda 11.7 on RTX6000 ADA #254

Closed tijszwinkels closed 1 year ago

tijszwinkels commented 1 year ago
Traceback (most recent call last):
  File "/root/anaconda3/envs/whisperx/bin/whisperx", line 8, in <module>
    sys.exit(cli())
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/whisperx/transcribe.py", line 199, in cli
    diarize_segments = diarize_model(input_audio_path, min_speakers=min_speakers, max_speakers=max_speakers)
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/whisperx/diarize.py", line 19, in __call__
    segments = self.model(audio, min_speakers=min_speakers, max_speakers=max_speakers)
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 324, in __call__
    return self.apply(file, **kwargs)
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 496, in apply
    embeddings = self.get_embeddings(
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 337, in get_embeddings
    embedding_batch: np.ndarray = self._embedding(
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 363, in __call__
    self.classifier_.encode_batch(signals, wav_lens=wav_lens)
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/speechbrain/pretrained/interfaces.py", line 943, in encode_batch
    feats = self.mods.compute_features(wavs)
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/speechbrain/lobes/features.py", line 138, in forward
    STFT = self.compute_STFT(wav)
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/speechbrain/processing/features.py", line 147, in forward
    stft = torch.stft(
  File "/root/anaconda3/envs/whisperx/lib/python3.10/site-packages/torch/functional.py", line 641, in stft
    return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

This problem is being discussed here: https://github.com/pytorch/pytorch/issues/88038

It's suggested that this is a bug in Cuda 11.7. Updating to Cuda 11.8 fixes the problem, and works well in my testing.

sollipse commented 1 year ago

+1 validated on my 4090