google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.55k stars 320 forks source link

[Bug] Incorrect estimation of transition_bias #55

Closed DonkeyShot21 closed 4 years ago

DonkeyShot21 commented 4 years ago

The way you estimate the transition_bias parameter seems illogic and incorrect to me. The parameter is calculated like this in uisrnn.utils.resize_sequence:

transit_num = 0
for entry in range(len(cluster_id) - 1):
  transit_num += (cluster_id[entry] != cluster_id[entry + 1])
bias_denominator = len(cluster_id)
bias = (transit_num + 1) / bias_denominator

So you look at how many speaker changes there are in the concatenated sequence and divide it by the number of segments. Apart from the fact that you should divide by the number of transitions between segments (len(cluster_id) - 1) and not by the number of segments, it seems to me that your code also counts as speaker change the transition between speakers that belong to different sequences. This error arises from the fact that you concatenate the sequences and later calculate transition_bias, while, actually, you should do the opposite, I reckon.

To make it clearer, here is an example. Given: seq1 = spk1 | spk2 | spk3 seq2 = spk2 | spk3| spk3

How it should be: n_speaker_changes = 3 n_transitions = 4 transition_bias = 3/4

Your calculation: concatenated_sequence = spk1 | spk2 | spk3 | spk2 | spk3| spk3 n_speaker_changes = 4 n_segments = 6 transition_bias = 2/3

This is not only a problem with toy examples. Imagine you have 1000 short sequences as train data, at the numerator you add a very big number n_sequences - 1 = 999. This can, in my opinion, hurt performance in some particular cases.

So the estimation you provide in the code is different from the one in the paper. Not considering enforce_cluster_id_uniqueness your estimation is something like: (speaker_changes + n_sequences - 1) / n_segments According to the paper, it should look more like: spaker_changes / (n_segments - n_sequences)

Also, there is no constraint on when you calculate it. It can be computed before calling fit_concatenated in the fit function. In the unlikely case that fit_concatenated is called straight from the user there is no way to calculate it, then maybe there could be a check to require the user to calculate transition_bias beforehand.

Sorry if I misunderstood something or got it all wrong. I'm here to help.

P.S.1: why don't you estimate crp_alpha the same way you do with transition_bias? Just because it is not possible in fit_concatenated? Estimating both parameters in the fit function seems reasonable. P.S.2: It feels a bit inelegant and misleading to calculate transition_bias inside a function that is called resize_sequence

wq2012 commented 4 years ago

Thanks @DonkeyShot21 for reporting the bugs.

This could be a real bug when we port the internal implementation to the open source project.

@AnzCol Could you look into this, and either fix the bug if you agree with @DonkeyShot21 , or provide some insight if you don't?

AnzCol commented 4 years ago

Hi Quan,

I agree with what he said. The bug could lead to a small bias of estimating transition probability.

I will fix it offline and send it to you.

Thanks, Aonan

发自我的 iPhone

在 2019年8月16日,上午10:24,Quan Wang notifications@github.com 写道:

Thanks @DonkeyShot21 for reporting the bugs.

This could be a real bug when we port the internal implementation to the open source project.

@AnzCol Could you look into this, and either fix the bug if you agree with @DonkeyShot21 , or provide some insight if you don't?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

DonkeyShot21 commented 4 years ago

Thanks for the quick replies. It is a pleasure to contribute and make this repo even more awesome.

What do you think about P.S.1?

DonkeyShot21 commented 4 years ago

Have you just started accepting contributions and pull requests? I saw the pull request for issue #26. If you want I can make a pull request for this issue, since I have already fixed it locally.

wq2012 commented 4 years ago

Hi @DonkeyShot21 ,

We are not officially accepting pull requests, and could not guarantee we will merge any PR. However, as several core members have left the team, we are being really slow on the open source side of this project.

As of this specific issue, I see you have great insight into the problem, so I would encourage you to send your PR. I will reviewer it carefully.

DonkeyShot21 commented 4 years ago

Thanks @wq2012, will do!

wq2012 commented 4 years ago

@DonkeyShot21 Hey are you still working on #57 ?

I see you marked some comments as resolved and left some open, but there are no updates in the file changes. Did you forget to commit?

DonkeyShot21 commented 4 years ago

Hi @wq2012, yes I am still working on it. I am very busy at the moment with work and phd application. I will push them a soon as I can.

wq2012 commented 4 years ago

@DonkeyShot21 Absolutely no worries on the PR. Feel free to work on it when you have the time. Just want to understand the status of the PR.

Good luck with your work and PhD applications!

DonkeyShot21 commented 4 years ago

Thanks!

wq2012 commented 4 years ago

@DonkeyShot21 Merged. Thanks for your contribution!

LavenderMP commented 4 years ago

@DonkeyShot21 since you mentioned #26 if you dont mind could you send me the source code?

DonkeyShot21 commented 4 years ago

I just linked to it, and inadvertently it got closed with this issue. This issue has nothing to do with it