juanmc2005 / diart

A python package to build AI-powered real-time audio applications
https://diart.readthedocs.io
MIT License
1.02k stars 87 forks source link

Add compatibility with new pyannote segmentation model #186

Closed juanmc2005 closed 11 months ago

juanmc2005 commented 11 months ago

Pyannote has recently released pyannote/segmentation-3.0, let's include it!

juanmc2005 commented 11 months ago

This could be done in a pretty straightforward way by checking the activation layer of the loaded segmentation model. In diart.models.PyannoteSegmentationModel.__call__(), we can add the following:

from pyannote.audio.utils.powerset import Powerset

segmentation = self.model(waveform)
if isinstance(self.model.activation, torch.nn.LogSoftmax):  # or Softmax
    powerset = Powerset(max_speakers_per_chunk, max_speakers_per_frame)
    return powerset.to_multilabel(segmentation)
return segmentation
hbredin commented 11 months ago

I would recommend checking if self.model.specifications.powerset instead.

Also, Powerset.to_multilabel now has a soft keyword argument that you can set to True to get soft multi-label segmentation (though I would recommend sticking with hard ones to remove the need for the activation threshold).

hbredin commented 11 months ago

I am willing to contribute this feature.

I plan to replace self.model(waveform) by self.to_multilabel(model(waveform)) where self.to_multilabel is

However, I don't want to have to go through the instantiation of powerset for every PyannoteSegmentationModel.__call__.

My question is therefore: where should I instantiate self.to_multilabel?

Also, Powerset inherits torch.nn.Module so should ideally be sent to the same device as self.model (but maybe diart only supports CPU for now?). Therefore, I believe the right way is to define it in PyannoteSegmentationModel.__init__ but maybe it goes against the whole LazyModel thing (which I don't really understand why it is necessary).

juanmc2005 commented 11 months ago

That's awesome! Thanks @hbredin for this contribution!

I would make this PowersetAdapter a wrapper of SegmentationModel and make PyannoteLoader set the wrapper accordingly.

Currently, the LazyModel guarantees that models can be shared across processes, for example for parallel benchmarking, and then load the weights when the process starts. This may not be the best approach, so I'm open to change it in the future.

juanmc2005 commented 11 months ago

I'm going to be working on moving pipeline configs to yaml files. This should allow configs to be serializable, which would remove the need for LazyModel, so I'll refactor the entire models.py at that point

juanmc2005 commented 11 months ago

Implemented in #198