the new prepare_data implementation (which allows caching to file) changes how things are stored. The PR did not update [self.prepared_data["metadata"] -> [self.prepared_data["metadata-values"] in the segmentation mixin.
when the combination of balanced key do not exist. For example when we have balance=['database','domain']), the implementation creates one sample generator for each possible combination (itertools.product of all values), but of course some combinations might not exist. To fix this, if the generator cannot produce anything, it returns a None as its first and only value (there might be a cleaner way).
Fixes
prepare_data
implementation (which allows caching to file) changes how things are stored. The PR did not update[self.prepared_data["metadata"]
->[self.prepared_data["metadata-values"]
in the segmentation mixin.balance=['database','domain']
), the implementation creates one sample generator for each possible combination (itertools.product
of all values), but of course some combinations might not exist. To fix this, if the generator cannot produce anything, it returns a None as its first and only value (there might be a cleaner way).I don't have time to really test but it should fix https://github.com/nttcslab-sp/mamba-diarization/issues/6 !
EDIT: might or might not work with pyannote's latest versions, needs testing