Closed juanmc2005 closed 11 months ago
This could be done in a pretty straightforward way by checking the activation layer of the loaded segmentation model.
In diart.models.PyannoteSegmentationModel.__call__()
, we can add the following:
from pyannote.audio.utils.powerset import Powerset
segmentation = self.model(waveform)
if isinstance(self.model.activation, torch.nn.LogSoftmax): # or Softmax
powerset = Powerset(max_speakers_per_chunk, max_speakers_per_frame)
return powerset.to_multilabel(segmentation)
return segmentation
I would recommend checking if self.model.specifications.powerset
instead.
Also, Powerset.to_multilabel
now has a soft
keyword argument that you can set to True
to get soft
multi-label segmentation (though I would recommend sticking with hard ones to remove the need for the activation threshold).
I am willing to contribute this feature.
I plan to replace self.model(waveform)
by self.to_multilabel(model(waveform))
where self.to_multilabel
is
Powerset(max_speakers_per_chunk, max_speakers_per_frame).to_multilabel
if model uses powerset multi-class paradigmtorch.nn.Identity()
if model uses multi-label paradigmHowever, I don't want to have to go through the instantiation of powerset
for every PyannoteSegmentationModel.__call__
.
My question is therefore: where should I instantiate self.to_multilabel
?
Also, Powerset
inherits torch.nn.Module
so should ideally be sent to the same device as self.model
(but maybe diart
only supports CPU for now?). Therefore, I believe the right way is to define it in PyannoteSegmentationModel.__init__
but maybe it goes against the whole LazyModel
thing (which I don't really understand why it is necessary).
That's awesome! Thanks @hbredin for this contribution!
I would make this PowersetAdapter
a wrapper of SegmentationModel
and make PyannoteLoader
set the wrapper accordingly.
Currently, the LazyModel
guarantees that models can be shared across processes, for example for parallel benchmarking, and then load the weights when the process starts. This may not be the best approach, so I'm open to change it in the future.
I'm going to be working on moving pipeline configs to yaml files. This should allow configs to be serializable, which would remove the need for LazyModel
, so I'll refactor the entire models.py
at that point
Implemented in #198
Pyannote has recently released
pyannote/segmentation-3.0
, let's include it!