juanmc2005 / diart

A python package to build AI-powered real-time audio applications
https://diart.readthedocs.io
MIT License
1.1k stars 90 forks source link

Optionally relax incremental clustering constraints #60

Open juanmc2005 opened 2 years ago

juanmc2005 commented 2 years ago

Problem

Cannot-link constraints are currently hard-coded in OnlineSpeakerClustering. If a segmentation model over-segments speakers, it may be better to rely on speaker embeddings instead to determine the identity of a speaker turn.

cc: @hbredin

Idea

Implement a different optimal mapping strategy in SpeakerMap that replaces LSAP (hungarian algorithm) with a simple argmax/argmin.

Example

A quick implementation could take advantage of the existing MappingMatrixObjective that's already detached from SpeakerMap.

import numpy as np
from diart.mapping import MappingMatrixObjective, SpeakerMap

class RelaxedMinimizationObjective(MappingMatrixObjective):
    def optimal_assignments(self, matrix: np.ndarray) -> List[int]:
        return list(np.argmin(matrix, axis=1))

relaxed_mapping = SpeakerMap(cost_matrix, RelaxedMinimizationObjective())
hbredin commented 2 years ago

I would also try to keep cannot-link constraints for overlapping speakers only (and allow merging non-overlapping speakers).

The implementation might be trickier, though.

juanmc2005 commented 2 years ago

That is a good idea. I think that would require major changes in SpeakerMap because right now it doesn't have a way of knowing who overlaps who. Or maybe it can be implemented in the MappingMatrixObjective subclass.