modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Apache License 2.0
1.07k stars 93 forks source link

The problem about the selection of num_of_spk in speaker-diarization #42

Closed leia404 closed 9 months ago

leia404 commented 9 months ago

The spectral clustering in speakerlab/process/cluster.py, the following code is used to estimate the number of speakers

lambda_gap_list = self.getEigenGaps(
                lambdas[self.min_num_spks - 1:self.max_num_spks + 1])
num_of_spk = np.argmax(lambda_gap_list) + self.min_num_spks

But in other related projects, the following code is used to estimate the number of speakers

num_spks = num_spks if num_spks is not None \
                else cp.argmax(cp.diff(eig_values[:max_num_spks + 1])) + 1
num_spks = max(num_spks, min_num_spks)

# another
lambda_gap_list = self.getEigenGaps(lambdas[1 : self.max_num_spkrs])

num_of_spk = (
    np.argmax(
        lambda_gap_list[
            : min(self.max_num_spkrs, len(lambda_gap_list))
        ]
    )
    if lambda_gap_list
    else 0
) + 2

I would like to know what is the theoretical basis for your design? If the number of speakers' sentences is uneven, such as if a speaker speaks very little, is this estimation still valid? Perhaps you can provide relevant information? Thank you in advance for your answer.

wanghuii1 commented 9 months ago
  1. When you set the 'self.min_num_spks' to 2, it will be equivalent to the other code.
  2. If a speaker speaks very little, it is hard to identify. Minimal clusters will be ignored in spectral clustering. If you want to do this, other clustering methods may be more appropriate, such as AHC. @leia404
leia404 commented 9 months ago

Okay, got it. Thanks for your answer!