pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.23k stars 770 forks source link

There is a problem with this module “pyannote-audio emb train ” #657

Closed TianlongKong closed 3 years ago

TianlongKong commented 3 years ago

Describe the bug I can't run the step of extracting embedding normally. this is the log:

/root/anaconda3/envs/pyannote_2/lib/python3.8/site-packages/pyannote/audio/embedding/approaches/arcface_loss.py:170: FutureWarning: The 's' parameter is deprecated in favor of 'scale', and will be removed in a future release
  warnings.warn(msg, FutureWarning)
Loading labels: 0file [00:01, ?file/s]
Traceback (most recent call last):
  File "/root/anaconda3/envs/pyannote_2/bin/pyannote-audio", line 8, in <module>
    sys.exit(main())
  File "/root/anaconda3/envs/pyannote_2/lib/python3.8/site-packages/pyannote/audio/applications/pyannote_audio.py", line 366, in main
    app.train(protocol, **params)
  File "/root/anaconda3/envs/pyannote_2/lib/python3.8/site-packages/pyannote/audio/applications/base.py", line 205, in train
    batch_generator = self.task_.get_batch_generator(
  File "/root/anaconda3/envs/pyannote_2/lib/python3.8/site-packages/pyannote/audio/embedding/approaches/base.py", line 111, in get_batch_generator
    return SpeechSegmentGenerator(
  File "/root/anaconda3/envs/pyannote_2/lib/python3.8/site-packages/pyannote/audio/embedding/generators.py", line 99, in __init__
    total_duration = self._load_metadata(protocol, subset=subset)
  File "/root/anaconda3/envs/pyannote_2/lib/python3.8/site-packages/pyannote/audio/embedding/generators.py", line 148, in _load_metadata
    support = Segment(start=0, end=current_file["duration"])
  File "/root/anaconda3/envs/pyannote_2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 122, in __getitem__
    value = self.lazy[key](self)
  File "/root/anaconda3/envs/pyannote_2/lib/python3.8/site-packages/pyannote/audio/features/utils.py", line 56, in get_audio_duration
    with SoundFile(current_file["audio"], "r") as f:
  File "/root/anaconda3/envs/pyannote_2/lib/python3.8/site-packages/pyannote/database/protocol/protocol.py", line 122, in __getitem__
    value = self.lazy[key](self)
  File "/root/anaconda3/envs/pyannote_2/lib/python3.8/site-packages/pyannote/database/util.py", line 120, in __call__
    path_templates = self.config_[database]
KeyError: 'VoxCeleb'

I can't find the yml of ~/.pyannote/database.yml, so I run: vim ~/.pyannote/database.yml and added these configurations:

Databases:

      VoxCeleb:
        - /share/kongtianlong/VoxCeleb1/dev/wav/{uri}.wav
        - /share/kongtianlong/VoxCeleb1/test/wav/{uri}.wav
        - /share/kongtianlong/VoxCeleb2/dev/aac/{uri}.wav
        - /share/kongtianlong/VoxCeleb2/test/aac/{uri}.wav

So I I guess there is a problem with voxceleb data config, but I don’t know how to modify it. Can you help me? Thanks!

To Reproduce Steps to reproduce the behavior:

$ pyannote-audio emb train --subset=train --to=250 --parallel=8 ${EXP_DIR} VoxCeleb.SpeakerVerification.VoxCeleb2

Content of config.yml

feature_extraction:
   name: pyannote.audio.features.RawAudio
   params:
      sample_rate: 16000

data_augmentation:
   name: pyannote.audio.augmentation.noise.AddNoise
   params:
     snr_min: 5
     snr_max: 15
     collection:
       - MUSAN.Collection.BackgroundNoise
       - MUSAN.Collection.Music

architecture:
   name: pyannote.audio.models.SincTDNN
   params:
      sincnet:
         stride: [5, 1, 1]
         waveform_normalize: True
         instance_normalize: True
      tdnn:
         embedding_dim: 512
      embedding:
         batch_normalize: False
         unit_normalize: False

task:
   name: AdditiveAngularMarginLoss
   params:
      margin: 0.05
      s: 10
      duration: 2.0
      per_fold: 256
      per_label: 1
      per_epoch: 5
      per_turn: 1
      label_min_duration: 30

scheduler:
   name: ConstantScheduler
   params:
      learning_rate: 0.01

pyannote environment

$ pip freeze | grep pyannote
pyannote.audio==1.1.1
pyannote.core==4.1
pyannote.database==4.1
pyannote.db.voxceleb==1.2
pyannote.metrics==3.0.1
pyannote.pipeline==1.5.2

Additional context Add any other context about the problem here.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.