Reproducible in 3.2.0, tested with a73ded27f297c6876b722e3c7bb77428a1bac1c7
System information
Linux / pyannote.audio 3.12 / pyannote.database 5.1.0 / Python 3.12
Issue description
In the mixins of the segmentation task, filtering is done using self.prepared_data["audio-metadata"]["subset"] == Subsets.index("train").
This works perfectly with normal protocols, but with meta-protocols, it seems to rely on the "original" subset, not the meta one.
Tested versions
Reproducible in 3.2.0, tested with a73ded27f297c6876b722e3c7bb77428a1bac1c7
System information
Linux / pyannote.audio 3.12 / pyannote.database 5.1.0 / Python 3.12
Issue description
In the mixins of the segmentation task, filtering is done using
self.prepared_data["audio-metadata"]["subset"] == Subsets.index("train")
. This works perfectly with normal protocols, but with meta-protocols, it seems to rely on the "original" subset, not the meta one.For example in meta protocol:
the 'train' subset will be considered empty (and pyannote will throw errors).
I haven't tested, but I suppose it "fails silently" (i.e. ignore the set) in other cases where there is data to train on:
Minimal reproduction example (MRE)
https://colab.research.google.com/drive/1kCy30rYG8fWltJfc_xPuX8AdL28y1gMc?usp=sharing