m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
10.11k stars 1.06k forks source link

[Feature Request] Support M1 Mac's GPU #109

Open 0x1FFFFF opened 1 year ago

0x1FFFFF commented 1 year ago

If I pass in mps to device option it will crush. Would be wonderful if M1 GPU can be supported


❯ whisperx assets/test.mp3 --device mps --model large-v2 --vad_filter --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --diarize --hf_token token --language en

torchvision is not available - cannot save figures
Performing VAD...
~~ Transcribing VAD chunk: (01:07:46.006 --> 01:07:50.663) ~~
loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1280x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1]    97334 abort      whisperx assets/test.mp3 --device mps --model large-v2 --vad_filter       en
/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
wllbll commented 1 year ago

ye, i got this error too, it seems we need to wait update from pytorch

skye-repos commented 1 year ago

Hi, I believe PyTorch has support for most of the functions now. Plus it can be run with env arg PYTORCH_ENABLE_MPS_FALLBACK=1 for those functions that aren't supported yet to fall back on the CPU.

I'm running pyannote and other projects with PyTorch compiled with support for mps so this should also be do-able

Herb-sh commented 9 months ago

@Harith163 That is correct. Pytorch has implemented "mps"(Metal Performance Shaders) for a at least a year now, Open-AI's Whisper supports "mps" as well but faster-whisper used by WhisperX apparently only supports "cpu" and "gpu".

I myself get "unsupported device mps", here is the error:

--> 128 self.model = ctranslate2.models.Whisper( 129 model_path, 130 device=device, 131 device_index=device_index, 132 compute_type=compute_type, 133 intra_threads=cpu_threads, 134 inter_threads=num_workers, 135 )

For me with Mac M1 "cpu" is extremely slow to the point I have not been able to get e proper transcription.

Any workaround to the issue? I believe this is essential for mac users.

Thanks :)

7k50 commented 9 months ago

@Harith163 That is correct. Pytorch has implemented "mps"(Metal Performance Shaders) for a at least a year now, Open-AI's Whisper supports "mps" as well but faster-whisper used by WhisperX apparently only supports "cpu" and "gpu".

I myself get "unsupported device mps", here is the error:

--> 128 self.model = ctranslate2.models.Whisper( 129 model_path, 130 device=device, 131 device_index=device_index, 132 compute_type=compute_type, 133 intra_threads=cpu_threads, 134 inter_threads=num_workers, 135 )

For me with Mac M1 "cpu" is extremely slow to the point I have not been able to get e proper transcription.

Any workaround to the issue? I believe this is essential for mac users.

Thanks :)

Whisper.cpp is fast on Apple Silicon ("Plain C/C++ implementation without dependencies" … "optimized via ARM NEON, Accelerate framework, Metal and Core ML"). However, I believe it only supports very rudimentary diarization currently.

Ideally, WhisperX's solutions for diarization, etc, could be made to work in the fashion of Whisper.cpp.

PurpShell commented 8 months ago

@Harith163 That is correct. Pytorch has implemented "mps"(Metal Performance Shaders) for a at least a year now, Open-AI's Whisper supports "mps" as well but faster-whisper used by WhisperX apparently only supports "cpu" and "gpu". I myself get "unsupported device mps", here is the error: --> 128 self.model = ctranslate2.models.Whisper( 129 model_path, 130 device=device, 131 device_index=device_index, 132 compute_type=compute_type, 133 intra_threads=cpu_threads, 134 inter_threads=num_workers, 135 ) For me with Mac M1 "cpu" is extremely slow to the point I have not been able to get e proper transcription. Any workaround to the issue? I believe this is essential for mac users. Thanks :)

Whisper.cpp is fast on Apple Silicon ("Plain C/C++ implementation without dependencies" … "optimized via ARM NEON, Accelerate framework, Metal and Core ML"). However, I believe it only supports very rudimentary diarization currently.

Ideally, WhisperX's solutions for diarization, etc, could be made to work in the fashion of Whisper.cpp.

That's only ideally, whisper.cpp's creator showed interest in the killer features of WhisperX but stated they are not coming any time soon.

I'd rather fix whisperx to work better on M1/Apple Silicon

1Dbcj commented 6 months ago

Just wanted to second this. I love whisperx on my PC, but on Mac it is just so slow.

Its resulted in fragmentation where if I want my script to be universal I have to look elsewhere. Really wish this could be supported.

AdrienLF commented 3 months ago

Any progress on this so far?