pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.03k stars 758 forks source link

3.3 dependencies #1727

Closed faroit closed 3 months ago

faroit commented 3 months ago

Tested versions

System information

macOS, m1

Issue description

Installing the most recent 3.3 version, trying out the new pixit pipeline i get the following errors (after downgrading to numpy 1.26.4 as 2.0 isn't compatible due to the new np.Nan alternative):

Lightning automatically upgraded your loaded checkpoint from v1.8.6 to v2.3.0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/pyannote/models--pyannote--separation-ami-1.0/snapshots/7d32de4e893657cd44dc643c9f6d413e90c051bc/pytorch_model.bin`
Some weights of the model checkpoint at microsoft/wavlm-large were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at microsoft/wavlm-large and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "/Users/faro/repositories/pixit_test/run.py", line 3, in <module>
    pipeline = Pipeline.from_pretrained(
  File "/Users/faro/repositories/pixit_test/env/lib/python3.9/site-packages/pyannote/audio/core/pipeline.py", line 137, in from_pretrained
    pipeline = Klass(**params)
  File "/Users/faro/repositories/pixit_test/env/lib/python3.9/site-packages/pyannote/audio/pipelines/speech_separation.py", line 175, in __init__
    self._embedding = PretrainedSpeakerEmbedding(
  File "/Users/faro/repositories/pixit_test/env/lib/python3.9/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 750, in PretrainedSpeakerEmbedding
    return SpeechBrainPretrainedSpeakerEmbedding(
  File "/Users/faro/repositories/pixit_test/env/lib/python3.9/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 241, in __init__
    raise ImportError(
ImportError: 'speechbrain' must be installed to use 'speechbrain/spkrec-ecapa-voxceleb@5c0be3875fda05e81f3c004ed8c7c06be308de1e' embeddings. Visit https://speechbrain.github.io for installation instructions.

speechbrain==1.0.0 is installed

Minimal reproduction example (MRE)

https://gist.github.com/faroit/fe4a29967debe5174b0d9391a7b008db

faroit commented 3 months ago

seems to a duplicate of https://github.com/pyannote/pyannote-audio/issues/1661

faroit commented 3 months ago

thanks @hbredin