pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
5.88k stars 752 forks source link

clustering must be one of [AgglomerativeClustering, OracleClustering] #1641

Closed andysingal closed 7 months ago

andysingal commented 7 months ago

Tested versions

%pip install  -q "librosa>=0.8.1" "matplotlib<3.8" "ruamel.yaml>=0.17.8,<0.17.29" pyannote.audio openvino>=2023.1.0

## login to huggingfacehub to get access to pre-trained model
from huggingface_hub import notebook_login, whoami

# try:
#     whoami()
#     print('Authorization token already provided')
# except OSError:
notebook_login()

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("philschmid/pyannote-speaker-diarization-endpoint")

ERROR

/usr/local/lib/python3.10/dist-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
/usr/local/lib/python3.10/dist-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
config.yaml: 100%
601/601 [00:00<00:00, 40.7kB/s]
pytorch_model.bin: 100%
17.7M/17.7M [00:00<00:00, 62.5MB/s]
config.yaml: 100%
318/318 [00:00<00:00, 23.5kB/s]
INFO:pytorch_lightning.utilities.migration.utils:Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.4. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../root/.cache/torch/pyannote/models--philschmid--pyannote-segmentation/snapshots/d13283ce236dd60ccbe5dafe181ae8fc57e85bfb/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.0+cu121. Bad things might happen unless you revert torch to 1.x.
hyperparams.yaml: 100%
1.92k/1.92k [00:00<00:00, 148kB/s]
embedding_model.ckpt: 100%
83.3M/83.3M [00:00<00:00, 245MB/s]
mean_var_norm_emb.ckpt: 100%
1.92k/1.92k [00:00<00:00, 139kB/s]
classifier.ckpt: 100%
5.53M/5.53M [00:00<00:00, 126MB/s]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pyannote/audio/pipelines/speaker_diarization.py](https://localhost:8080/#) in __init__(self, segmentation, segmentation_step, embedding, embedding_exclude_overlap, clustering, embedding_batch_size, segmentation_batch_size, der_variant, use_auth_token)
    173         try:
--> 174             Klustering = Clustering[clustering]
    175         except KeyError:

3 frames
KeyError: 'HiddenMarkovModelClustering'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pyannote/audio/pipelines/speaker_diarization.py](https://localhost:8080/#) in __init__(self, segmentation, segmentation_step, embedding, embedding_exclude_overlap, clustering, embedding_batch_size, segmentation_batch_size, der_variant, use_auth_token)
    174             Klustering = Clustering[clustering]
    175         except KeyError:
--> 176             raise ValueError(
    177                 f'clustering must be one of [{", ".join(list(Clustering.__members__))}]'
    178             )

ValueError: clustering must be one of [AgglomerativeClustering, OracleClustering]

System information

Colab Free Version

Issue description

ERROR

/usr/local/lib/python3.10/dist-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
/usr/local/lib/python3.10/dist-packages/torch_audiomentations/utils/io.py:27: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
config.yaml: 100%
601/601 [00:00<00:00, 40.7kB/s]
pytorch_model.bin: 100%
17.7M/17.7M [00:00<00:00, 62.5MB/s]
config.yaml: 100%
318/318 [00:00<00:00, 23.5kB/s]
INFO:pytorch_lightning.utilities.migration.utils:Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.4. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../root/.cache/torch/pyannote/models--philschmid--pyannote-segmentation/snapshots/d13283ce236dd60ccbe5dafe181ae8fc57e85bfb/pytorch_model.bin`
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.0+cu121. Bad things might happen unless you revert torch to 1.x.
hyperparams.yaml: 100%
1.92k/1.92k [00:00<00:00, 148kB/s]
embedding_model.ckpt: 100%
83.3M/83.3M [00:00<00:00, 245MB/s]
mean_var_norm_emb.ckpt: 100%
1.92k/1.92k [00:00<00:00, 139kB/s]
classifier.ckpt: 100%
5.53M/5.53M [00:00<00:00, 126MB/s]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pyannote/audio/pipelines/speaker_diarization.py](https://localhost:8080/#) in __init__(self, segmentation, segmentation_step, embedding, embedding_exclude_overlap, clustering, embedding_batch_size, segmentation_batch_size, der_variant, use_auth_token)
    173         try:
--> 174             Klustering = Clustering[clustering]
    175         except KeyError:

3 frames
KeyError: 'HiddenMarkovModelClustering'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/pyannote/audio/pipelines/speaker_diarization.py](https://localhost:8080/#) in __init__(self, segmentation, segmentation_step, embedding, embedding_exclude_overlap, clustering, embedding_batch_size, segmentation_batch_size, der_variant, use_auth_token)
    174             Klustering = Clustering[clustering]
    175         except KeyError:
--> 176             raise ValueError(
    177                 f'clustering must be one of [{", ".join(list(Clustering.__members__))}]'
    178             )

ValueError: clustering must be one of [AgglomerativeClustering, OracleClustering]

Minimal reproduction example (MRE)

https://colab.research.google.com/drive/1xCYQ4M84PDcaD7JOmDayh17jbTfLnhZX?usp=sharing

hbredin commented 7 months ago

Thanks for the MRE.

philschmid/pyannote-speaker-diarization-endpoint is not an official pyannote pipeline so is no longer supported in pyannote 3.x.

You should downgrade to pyannote 2.x.