kadirnar / whisper-plus

WhisperPlus: Faster, Smarter, and More Capable 🚀
Apache License 2.0
1.67k stars 133 forks source link

Can't run Diarization on MPS device #71

Closed jeanjerome closed 4 months ago

jeanjerome commented 5 months ago

I did not manage to run the Speaker Diarization from the README example on an Appel MPS device.

I got this error and don't know how to fix it:


% python app-plus.py
/opt/homebrew/Caskroom/miniconda/base/envs/vox-catalyst/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/opt/homebrew/Caskroom/miniconda/base/envs/vox-catalyst/lib/python3.11/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/opt/homebrew/Caskroom/miniconda/base/envs/vox-catalyst/lib/python3.11/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/opt/homebrew/Caskroom/miniconda/base/envs/vox-catalyst/lib/python3.11/site-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
2024-04-09 11:17:30,670 - INFO - Downloading started... output/test.mp3
2024-04-09 11:18:41,270 - INFO - Download and conversion successful. File saved at: output/test.mp3
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-04-09 11:18:51,055 - INFO - Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/pyannote/models--pyannote--segmentation/snapshots/c4c8ceafcbb3a7a280c2d357aee9fbc9b0be7f9b/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.2. Bad things might happen unless you revert torch to 1.x.
Traceback (most recent call last):
  File "/Users/jeanjerome/PROJETS/voxcatalyst/app-plus.py", line 10, in <module>
    pipeline = ASRDiarizationPipeline.from_pretrained(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/vox-catalyst/lib/python3.11/site-packages/whisperplus/pipelines/whisper_diarize.py", line 43, in from_pretrained
    diarization_pipeline = Pipeline.from_pretrained(diarizer_model, use_auth_token=use_auth_token)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/vox-catalyst/lib/python3.11/site-packages/pyannote/audio/core/pipeline.py", line 136, in from_pretrained
    pipeline = Klass(**params)
               ^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/vox-catalyst/lib/python3.11/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 167, in __init__
    self._embedding = PretrainedSpeakerEmbedding(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/vox-catalyst/lib/python3.11/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 754, in PretrainedSpeakerEmbedding
    return SpeechBrainPretrainedSpeakerEmbedding(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/vox-catalyst/lib/python3.11/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 245, in __init__
    raise ImportError(
ImportError: 'speechbrain' must be installed to use 'speechbrain/spkrec-ecapa-voxceleb' embeddings. Visit https://speechbrain.github.io for installation instructions.

Also speechbrain is installed:

% pip list | grep speechbrain
speechbrain               1.0.0

And HF token is declared in use_auth_token attribute...

Any idea? Thanks for your response... and your great work!

jeanjerome commented 5 months ago

Same error when device is set to cpu.

jeanjerome commented 5 months ago

This seems to originate from the pyannote or speechbrain libraries as indicated by the issue https://github.com/pyannote/pyannote-audio/issues/1677. A workaround is to pip install speechbrain==0.5.16.

It now works with the cpu device on silicon Mac but now get this error with mps one :

Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.
Traceback (most recent call last):
  File "/Users/jeanjerome/PROJETS/voxcatalyst/app-plus.py", line 18, in <module>
    output_text = pipeline(audio_path, num_speakers=2)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/vox-catalyst/lib/python3.11/site-packages/whisperplus/pipelines/whisper_diarize.py", line 171, in __call__
    upto_idx = np.argmin(np.abs(end_timestamps - end_time))
                                ~~~~~~~~~~~~~~~^~~~~~~~~~
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
kadirnar commented 4 months ago

Thank you for the error and solution method. I will test it. @jeanjerome