Error when loading diarizarion model to pipeline

gullerg commented 2 years ago

Describe the bug Trying out pyannote today and have trouble getting started with the diarizarion pipeline. Whenever I try to load the model from torch hub, I get the following error:

Using cache found in /Users/X/.cache/torch/hub/pyannote_pyannote-audio_master
Using cache found in /Users/X/.cache/torch/hub/pyannote_pyannote-audio_master
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/opt/anaconda3/envs/ml/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/opt/anaconda3/envs/ml/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/opt/anaconda3/envs/ml/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/X/dev/extract-audio-from-video/test.py", line 3, in <module>
    pipeline = torch.hub.load('pyannote/pyannote-audio', 'dia')
  File "/opt/anaconda3/envs/ml/lib/python3.9/site-packages/torch/hub.py", line 382, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "/opt/anaconda3/envs/ml/lib/python3.9/site-packages/torch/hub.py", line 411, in _load_local
    model = entry(*args, **kwargs)
  File "/Users/X/.cache/torch/hub/pyannote_pyannote-audio_master/hubconf.py", line 242, in _generic
    from pyannote.audio.pipeline.utils import load_pretrained_pipeline
  File "/opt/anaconda3/envs/ml/lib/python3.9/site-packages/pyannote/audio/pipeline/__init__.py", line 33, in <module>
    from .speech_activity_detection import SpeechActivityDetection
  File "/opt/anaconda3/envs/ml/lib/python3.9/site-packages/pyannote/audio/pipeline/speech_activity_detection.py", line 44, in <module>
    from pyannote.metrics.detection import DetectionErrorRate
  File "/opt/anaconda3/envs/ml/lib/python3.9/site-packages/pyannote/metrics/__init__.py", line 40, in <module>
    manager_ = Manager()
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/managers.py", line 554, in start
    self._process.start()
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "/Users/X/dev/extract-audio-from-video/test.py", line 3, in <module>
    pipeline = torch.hub.load('pyannote/pyannote-audio', 'dia')
  File "/opt/anaconda3/envs/ml/lib/python3.9/site-packages/torch/hub.py", line 382, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "/opt/anaconda3/envs/ml/lib/python3.9/site-packages/torch/hub.py", line 411, in _load_local
    model = entry(*args, **kwargs)
  File "/Users/X/.cache/torch/hub/pyannote_pyannote-audio_master/hubconf.py", line 242, in _generic
    from pyannote.audio.pipeline.utils import load_pretrained_pipeline
  File "/opt/anaconda3/envs/ml/lib/python3.9/site-packages/pyannote/audio/pipeline/__init__.py", line 33, in <module>
    from .speech_activity_detection import SpeechActivityDetection
  File "/opt/anaconda3/envs/ml/lib/python3.9/site-packages/pyannote/audio/pipeline/speech_activity_detection.py", line 44, in <module>
    from pyannote.metrics.detection import DetectionErrorRate
  File "/opt/anaconda3/envs/ml/lib/python3.9/site-packages/pyannote/metrics/__init__.py", line 40, in <module>
    manager_ = Manager()
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/managers.py", line 558, in start
    self._address = reader.recv()
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/opt/anaconda3/envs/ml/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError

To Reproduce I simply run the following lines of code:

import torch
DEMO_FILE = {"audio": "./test.wav"}
pipeline = torch.hub.load('pyannote/pyannote-audio', 'dia')
diarization = pipeline(DEMO_FILE)

Content of config.yml

pipeline:
   name: pyannote.audio.pipeline.speaker_diarization.SpeakerDiarization
   params:
      sad_scores: sad_ami
      scd_scores: scd_ami
      embedding: emb_ami
      evaluation_only: True
      method: affinity_propagation

freeze:
   speech_turn_segmentation:
      speech_activity_detection:
         min_duration_off: 0.6315121069334447
         min_duration_on: 0.0007366523493967721
         offset: 0.5727193137037349
         onset: 0.5842225805454029
         pad_offset: 0.0
         pad_onset: 0.0

Additional context I keep the Python file and YAML file in the same folder

hbredin commented 2 years ago

I think this is due to recent changes in (macOS, right?) Python multiprocessing.

Can you try this line? Do you get the same error? from pyannote.metrics.detection import DetectionErrorRate

gullerg commented 2 years ago

Yeah I get the same error when I run that (using macOS)

hbredin commented 2 years ago

I just transfered this issue to pyannote.metrics.

Can you please let me know about:

your operating system (Windows, Linux, macOS) and its version
the version of your Python

gullerg commented 2 years ago

macOS Big Sure, version 11.4 Python 3.8.8

hbredin commented 2 years ago

Fixed in pyannote.metrics 3.1

pip install pyannote.metrics==3.1

pyannote / pyannote-metrics

Error when loading diarizarion model to pipeline #53