wenet-e2e / wespeaker

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
Apache License 2.0
599 stars 104 forks source link

About implement of Normalized Maximum Eigengap Spectral Clustering(NME-SC) for Speaker Diarizaton #287

Closed Zhubisong closed 1 month ago

Zhubisong commented 4 months ago

Thank you for uploading pre-trained ECAPA-TDNN model.

For speaker diarization, the spectral clustering algorithm used by wespeaker uses the p-neighbor binarization scheme, and "p" should be choosed by people. I want to know how to choose "p" for different dataset(such as AMI, DIHARD, MagicData, Callhome or AISHELL4), 0.01 is ok?

In "Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap", author proposed NME-SC, the algorithm free us for choosing "p". I want to know if wespeaker can Implement the algorithm?

JiJiJiang commented 4 months ago
  1. I think there isn't a fixed "p" can perform well in all datasets as you mention, which is exactly why the NME-SC algorithm is proposed ans works. In my experience, "p" in [0.01, 0.05] would get a modest result. Also, you can refer to our setup in our diarization recipe.
  2. This algorithm is essentially enumerating the "p" value and find the best in the dev set, which is costly in computation. You can easily implement it from our diarization codes by adding a for loop of "p". Maybe you can contribute the codes when you finish it!
JiJiJiang commented 3 months ago

This git repo may also help: Auto-Tuning-Spectral-Clustering