pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.37k stars 784 forks source link

Offline loading of pipeline ('NoneType' object has no attribute 'eval') #1319

Open Sharrnah opened 1 year ago

Sharrnah commented 1 year ago

I am not sure if i am missing something. I followed the documentation in how to load a pipeline for speaker diarization offline.

i followed this description: https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb

But used the config.yaml from https://huggingface.co/pyannote/speaker-diarization instead of the VAD that is used in the offline use section since i want to use speaker diarization and not voice activity detection. (thats a bit confusing since at the top of that description, speaker diarization is used.)

i try to load it like this:

from pyannote.audio import Pipeline as PyannotePipeline
PyannotePipeline.from_pretrained(str(Path(cache_pyannote_path / "speaker-diarization" / "pipeline_config.yaml").resolve()))

but i get the following error:

2023-04-07 20:39:16 - +--------------------- Traceback (most recent call last) ---------------------+
2023-04-07 20:39:16 - | E:\AI\xyz\xyz.py:566 |
2023-04-07 20:39:16 - | in <module>                                                                 |
2023-04-07 20:39:16 - |                                                                             |
2023-04-07 20:39:16 - |   563         return bool(string)                                           |
2023-04-07 20:39:16 - |   564                                                                       |
2023-04-07 20:39:16 - |   565                                                                       |
2023-04-07 20:39:16 - | > 566 main()                                                                |
2023-04-07 20:39:16 - |   567                                                                       |
2023-04-07 20:39:16 - |                                                                             |
2023-04-07 20:39:16 - | E:\Python\Python310\lib\site-packages\click\core.py:1130 in __call__        |
2023-04-07 20:39:16 - |                                                                             |
2023-04-07 20:39:16 - |   1127                                                                      |
2023-04-07 20:39:16 - |   1128     def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:      |
2023-04-07 20:39:16 - |   1129         """Alias for :meth:`main`."""                                |
2023-04-07 20:39:16 - | > 1130         return self.main(*args, **kwargs)                            |
2023-04-07 20:39:16 - |   1131                                                                      |
2023-04-07 20:39:16 - |   1132                                                                      |
2023-04-07 20:39:16 - |   1133 class Command(BaseCommand):                                          |
2023-04-07 20:39:16 - |                                                                             |
2023-04-07 20:39:16 - | E:\Python\Python310\lib\site-packages\click\core.py:1055 in main            |
2023-04-07 20:39:16 - |                                                                             |
2023-04-07 20:39:16 - |   1052         try:                                                         |
2023-04-07 20:39:16 - |   1053             try:                                                     |
2023-04-07 20:39:16 - |   1054                 with self.make_context(prog_name, args, **extra) as  |
2023-04-07 20:39:16 - | > 1055                     rv = self.invoke(ctx)                            |
2023-04-07 20:39:16 - |   1056                     if not standalone_mode:                          |
2023-04-07 20:39:16 - |   1057                         return rv                                    |
2023-04-07 20:39:16 - |   1058                     # it's not safe to `ctx.exit(rv)` here!          |
2023-04-07 20:39:16 - |                                                                             |
2023-04-07 20:39:16 - | E:\Python\Python310\lib\site-packages\click\core.py:1404 in invoke          |
2023-04-07 20:39:16 - |                                                                             |
2023-04-07 20:39:16 - |   1401             echo(style(message, fg="red"), err=True)                 |
2023-04-07 20:39:16 - |   1402                                                                      |
2023-04-07 20:39:16 - |   1403         if self.callback is not None:                                |
2023-04-07 20:39:16 - | > 1404             return ctx.invoke(self.callback, **ctx.params)           |
2023-04-07 20:39:16 - |   1405                                                                      |
2023-04-07 20:39:16 - |   1406     def shell_complete(self, ctx: Context, incomplete: str) -> t.Lis |
2023-04-07 20:39:16 - |   1407         """Return a list of completions for the incomplete value. Lo |
2023-04-07 20:39:16 - |                                                                             |
2023-04-07 20:39:16 - | E:\Python\Python310\lib\site-packages\click\core.py:760 in invoke           |
2023-04-07 20:39:16 - |                                                                             |
2023-04-07 20:39:16 - |    757                                                                      |
2023-04-07 20:39:16 - |    758         with augment_usage_errors(__self):                           |
2023-04-07 20:39:16 - |    759             with ctx:                                                |
2023-04-07 20:39:16 - | >  760                 return __callback(*args, **kwargs)                   |
2023-04-07 20:39:16 - |    761                                                                      |
2023-04-07 20:39:16 - |    762     def forward(                                                     |
2023-04-07 20:39:16 - |    763         __self, __cmd: "Command", *args: t.Any, **kwargs: t.Any  # n |
2023-04-07 20:39:16 - |                                                                             |
2023-04-07 20:39:18 - | E:\Python\Python310\lib\site-packages\click\decorators.py:26 in new_func    |
2023-04-07 20:39:18 - |                                                                             |
2023-04-07 20:39:18 - |    23     """                                                               |
2023-04-07 20:39:18 - |    24                                                                       |
2023-04-07 20:39:18 - |    25     def new_func(*args, **kwargs):  # type: ignore                    |
2023-04-07 20:39:18 - | >  26         return f(get_current_context(), *args, **kwargs)              |
2023-04-07 20:39:18 - |    27                                                                       |
2023-04-07 20:39:18 - |    28     return update_wrapper(t.cast(F, new_func), f)                     |
2023-04-07 20:39:18 - |    29                                                                       |
2023-04-07 20:39:18 - |                                                                             |
2023-04-07 20:39:18 - | E:\AI\xyz\xyz.py:373 |
2023-04-07 20:39:18 - | in main                                                                     |
2023-04-07 20:39:18 - |                                                                             |
2023-04-07 20:39:18 - |   370                                                                       |
2023-04-07 20:39:18 - |   371         # load speaker diarization model                              |
2023-04-07 20:39:18 - |   372         #diarization_model = PyannoteModel.from_pretrained(str(Path(c |
2023-04-07 20:39:18 - | > 373         diarization_pipeline = PyannotePipeline.from_pretrained(str(P |
2023-04-07 20:39:18 - |   374                                                                       |
2023-04-07 20:39:18 - |   375         # num_samples = 1536                                          |
2023-04-07 20:39:18 - |   376         num_samples = int(settings.SetOption("vad_num_samples",       |
2023-04-07 20:39:18 - |                                                                             |
2023-04-07 20:39:18 - | E:\Python\Python310\lib\site-packages\pyannote\audio\core\pipeline.py:126   |
2023-04-07 20:39:18 - | in from_pretrained                                                          |
2023-04-07 20:39:18 - |                                                                             |
2023-04-07 20:39:18 - |   123         )                                                             |
2023-04-07 20:39:18 - |   124         params = config["pipeline"].get("params", {})                 |
2023-04-07 20:39:18 - |   125         params.setdefault("use_auth_token", use_auth_token)           |
2023-04-07 20:39:18 - | > 126         pipeline = Klass(**params)                                    |
2023-04-07 20:39:18 - |   127                                                                       |
2023-04-07 20:39:18 - |   128         # freeze  parameters                                          |
2023-04-07 20:39:18 - |   129         if "freeze" in config:                                        |
2023-04-07 20:39:18 - |                                                                             |
2023-04-07 20:39:18 - | E:\Python\Python310\lib\site-packages\pyannote\audio\pipelines\speaker_diar |
2023-04-07 20:39:18 - | ization.py:125 in __init__                                                  |
2023-04-07 20:39:18 - |                                                                             |
2023-04-07 20:39:18 - |   122         super().__init__()                                            |
2023-04-07 20:39:18 - |   123                                                                       |
2023-04-07 20:39:18 - |   124         self.segmentation_model = segmentation                        |
2023-04-07 20:39:18 - | > 125         model: Model = get_model(segmentation, use_auth_token=use_aut |
2023-04-07 20:39:18 - |   126                                                                       |
2023-04-07 20:39:18 - |   127         self.segmentation_batch_size = segmentation_batch_size        |
2023-04-07 20:39:18 - |   128         self.segmentation_duration = (                                |
2023-04-07 20:39:18 - |                                                                             |
2023-04-07 20:39:18 - | E:\Python\Python310\lib\site-packages\pyannote\audio\pipelines\utils\getter |
2023-04-07 20:39:18 - | .py:89 in get_model                                                         |
2023-04-07 20:39:18 - |                                                                             |
2023-04-07 20:39:18 - |    86             f"expected `str` or `dict`."                              |
2023-04-07 20:39:18 - |    87         )                                                             |
2023-04-07 20:39:18 - |    88                                                                       |
2023-04-07 20:39:18 - | >  89     model.eval()                                                      |
2023-04-07 20:39:18 - |    90     return model                                                      |
2023-04-07 20:39:18 - |    91                                                                       |
2023-04-07 20:39:18 - |    92                                                                       |
2023-04-07 20:39:18 - +-----------------------------------------------------------------------------+
2023-04-07 20:39:18 - AttributeError: 'NoneType' object has no attribute 'eval'
github-actions[bot] commented 1 year ago

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

gamingflexer commented 1 year ago
  1. Visit hf.co/pyannote/speaker-diarization
  2. hf.co/pyannote/segmentation

Accept user conditions of Both Models

And add the user token during downloading the model from pre-trianed

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
                                    use_auth_token="TOKEN HERE")
Sharrnah commented 1 year ago
  1. Visit hf.co/pyannote/speaker-diarization

    1. hf.co/pyannote/segmentation

Accept user conditions of Both Models

And add the user token during downloading the model from pre-trianed

from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
                                    use_auth_token="TOKEN HERE")

The whole reason to do offline loading is that you do not need a token for every user of the software. I can't expect that every user will create a huggingface account etc.

albyho commented 1 year ago
  1. Edit your/path/to/pyannote/speaker-diarization/config.yaml
pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    clustering: AgglomerativeClustering
    embedding: your/path/to/speechbrain/spkrec-ecapa-voxceleb # Folder, must contains `speechbrain` keyword.
    embedding_batch_size: 32
    embedding_exclude_overlap: true
    segmentation: your/path/to/pyannote/segmentation/pytorch_model@2.1.bin # File
    segmentation_batch_size: 32

params:
  clustering:
    method: centroid
    min_cluster_size: 15
    threshold: 0.7153814381597874
  segmentation:
    min_duration_off: 0.5817029604921046
    threshold: 0.4442333667381752
  1. Edit pyannote/audio/pipelines/speaker_verification.py(version 2.1.1)
        self.classifier_ = SpeechBrain_EncoderClassifier.from_hparams(
            source=self.embedding,
            savedir=self.embedding if Path(self.embedding).exists() else f"{CACHE_DIR}/speechbrain",
            run_opts={"device": self.device},
            use_auth_token=use_auth_token,
        )
  1. Speaker diarization
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("your/path/to/pyannote/speaker-diarization/config.yaml")
Sharrnah commented 1 year ago

thanks. but still no luck. I placed the spkrec-ecapa-voxceleb stuff besides the pipeline config and changed the config accordingly:

pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    clustering: AgglomerativeClustering
    embedding: speechbrain/spkrec-ecapa-voxceleb
    embedding_batch_size: 32
    embedding_exclude_overlap: true
    segmentation: pytorch_model.bin
    segmentation_batch_size: 32

params:
  clustering:
    method: centroid
    min_cluster_size: 15
    threshold: 0.7153814381597874
  segmentation:
    min_duration_off: 0.5817029604921046
    threshold: 0.4442333667381752

and changed the speaker_verification.py like you mentioned.

albyho commented 1 year ago

You need to download from speechbrain/spkrec-ecapa-voxceleb to the local speechbrain/spkrec-ecapa-voxceleb directory.

classifier.ckpt embedding_model.ckpt hyperparams.yaml label_encoder.ckpt mean_var_norm_emb.ckpt

funboarder13920 commented 1 year ago

Hello @hbredin ,

Would a PR containing the following point be accepted for offline model loading ?

2. Edit `pyannote/audio/pipelines/speaker_verification.py`(version 2.1.1)
        self.classifier_ = SpeechBrain_EncoderClassifier.from_hparams(
            source=self.embedding,
            savedir=self.embedding if Path(self.embedding).exists() else f"{CACHE_DIR}/speechbrain",
            run_opts={"device": self.device},
            use_auth_token=use_auth_token,
        )

The modified line is savedir=self.embedding if Path(self.embedding).exists() else f"{CACHE_DIR}/speechbrain", at https://github.com/pyannote/pyannote-audio/blob/11b56a137a578db9335efc00298f6ec1932e6317/pyannote/audio/pipelines/speaker_verification.py#L260

hbredin commented 1 year ago

I'd gladly have a look at a PR facilitating the offline use of pyannote. Would be nice to also update the related part of the documentation.

funboarder13920 commented 1 year ago

I will take a look and submit a PR.

eguar11011 commented 1 year ago

The tutorial doesn't work? Because I'm getting the same error when running it: from pyannote.audio import Pipeline pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization', use_auth_token=True)

Error: model.eval() return model AttributeError: 'NoneType' object has no attribute 'eval'

ashu5644 commented 1 year ago

Hi @hbredin

I am also facing similar issue with offline usage with VAD config. pyannote.audio version == 2.1.1 Config pipeline: name: pyannote.audio.pipelines.VoiceActivityDetection params: segmentation: pytorch_model.bin params: min_duration_off: 0.09791355693027545 min_duration_on: 0.05537587440407595 offset: 0.4806866463041527 onset: 0.8104268538848918

Code pipeline = Pipeline.from_pretrained(f"vad_config.yaml")

Error: AttributeError: 'NoneType' object has no attribute 'eval'

KobaKhit commented 1 year ago

Hello. Are there plans to make the offline use of speaker-diarization-3.0 pipeline work? I tried above suggestions to no avail.

hbredin commented 1 year ago

pyannote models and pipelines have always been usable offline. The documentation is just... missing.

Also, feel free to make a PR improving the documentation!

Sharrnah commented 1 year ago

except that this is exactly what i tried some time ago without success.

Would have to try it again to see if it works now or if i was just missing something else in detail.

So yes. an updated documentation would help a lot if someone gets this to work and would update it.

tonytorm commented 8 months ago

issue is still there as of today, the model is not found for some reason and a none value is returned, can someone look into this please.

sjtu-hxj commented 7 months ago

https://github.com/pyannote/pyannote-audio/pull/1682

simonottenhauskenbun commented 7 months ago

I did not see this issue when proposing the respective PR: https://github.com/pyannote/pyannote-audio/pull/1682

please check if the new tutorial addresses these issues: https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/community/offline_usage_speaker_diarization.ipynb

darkzbaron commented 5 months ago

I solved this by accepting both conditions as mentioned in the readme

wwfcnu commented 4 months ago
  1. Edit your/path/to/pyannote/speaker-diarization/config.yaml
pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    clustering: AgglomerativeClustering
    embedding: your/path/to/speechbrain/spkrec-ecapa-voxceleb # Folder, must contains `speechbrain` keyword.
    embedding_batch_size: 32
    embedding_exclude_overlap: true
    segmentation: your/path/to/pyannote/segmentation/pytorch_model@2.1.bin # File
    segmentation_batch_size: 32

params:
  clustering:
    method: centroid
    min_cluster_size: 15
    threshold: 0.7153814381597874
  segmentation:
    min_duration_off: 0.5817029604921046
    threshold: 0.4442333667381752
  1. Edit pyannote/audio/pipelines/speaker_verification.py(version 2.1.1)
        self.classifier_ = SpeechBrain_EncoderClassifier.from_hparams(
            source=self.embedding,
            savedir=self.embedding if Path(self.embedding).exists() else f"{CACHE_DIR}/speechbrain",
            run_opts={"device": self.device},
            use_auth_token=use_auth_token,
        )
  1. Speaker diarization
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("your/path/to/pyannote/speaker-diarization/config.yaml")

按照你的方法,还是报错huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name'

chenchun0629 commented 1 week ago

it's my config.yaml

version: 3.1.0

pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    clustering: AgglomerativeClustering
    #embedding: pyannote/wespeaker-voxceleb-resnet34-LM
    embedding: /path/to/models/hbredin/wespeaker-voxceleb-resnet34-LM/speaker-embedding.onnx
    embedding_batch_size: 32
    embedding_exclude_overlap: true
    #segmentation: pyannote/segmentation-3.0
    segmentation: /path/to/models/pyannote/segmentation-3.0/pytorch_model.bin
    segmentation_batch_size: 32

params:
  clustering:
    method: centroid
    min_cluster_size: 12
    threshold: 0.7045654963945799
  segmentation:
    min_duration_off: 0.0