pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
http://pyannote.github.io
MIT License
6.38k stars 784 forks source link

Wrong usage of meta-protocols subsets in segmentation tasks #1709

Closed FrenchKrab closed 5 months ago

FrenchKrab commented 6 months ago

Tested versions

Reproducible in 3.2.0, tested with a73ded27f297c6876b722e3c7bb77428a1bac1c7

System information

Linux / pyannote.audio 3.12 / pyannote.database 5.1.0 / Python 3.12

Issue description

In the mixins of the segmentation task, filtering is done using self.prepared_data["audio-metadata"]["subset"] == Subsets.index("train"). This works perfectly with normal protocols, but with meta-protocols, it seems to rely on the "original" subset, not the meta one.

For example in meta protocol:

Protocols:
  X:
    SpeakerDiarization:
      MyMETA:
        train:
          MyProtocol.SpeakerDiarization.A: ['development']
        development:
          MyProtocol.SpeakerDiarization.A: ['development']

the 'train' subset will be considered empty (and pyannote will throw errors).

I haven't tested, but I suppose it "fails silently" (i.e. ignore the set) in other cases where there is data to train on:

Protocols:
  X:
    SpeakerDiarization:
      MyMETA:
        train:
          SomeOtherProtocol.SpeakerDiarization.A: ['train']
          MyProtocol.SpeakerDiarization.A: ['development']
        development:
          MyProtocol.SpeakerDiarization.A: ['development']

Minimal reproduction example (MRE)

https://colab.research.google.com/drive/1kCy30rYG8fWltJfc_xPuX8AdL28y1gMc?usp=sharing

hbredin commented 6 months ago

@clement-pages any idea?