google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.55k stars 320 forks source link

Add support for estimation of crp_alpha #4

Open wq2012 opened 5 years ago

wq2012 commented 5 years ago

Currently in this open source version, crp_alpha is passed in as an argument.

We need to add the support to estimate it from training data.

fanlu commented 5 years ago

How to fix the total number of speakers? In most call center scenario, there are only 2 or 3 speakers.

wq2012 commented 5 years ago

@fanlu The whole idea of UIS-RNN is to be able to handle unbounded number of speakers by learning from examples, instead of enforcing the number of speakers.

If you train UIS-RNN with call center audios where there are always 2 or 3 speakers, it should be able to predict at most 2 or 3 speakers, without requiring additional constraints.

However, since you asked, let me create a feature request issue for it. But likely we won't work on it for any time soon.

suzinia commented 4 years ago

Hi, do you have any update about this issue? Or do you have any suggestion related to the input parameter adjustment when the system tends to add too many speakers?

wq2012 commented 4 years ago

@suzinia Unfortunately no, since some core members have left the team.

You can try to locally apply #56 to constrain the number of speakers. It's not really very correct, but may solve your immediate problem.

suzinia commented 4 years ago

Thanks, I'll try that out!