google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
https://arxiv.org/abs/1810.04719
Apache License 2.0
1.56k stars 319 forks source link

Loss calculation in prediction #58

Closed gen35 closed 5 years ago

gen35 commented 5 years ago

Describe the question

When loss (or negative log likelihood) is calculated in '_update_beam_state', speaker change and assignment log likelihoods are subtracted from MSE loss. I fail to comprehend why MSE loss is treated as log likelihood of sequence generation.

I think both transition bias and CRP alpha estimations hardly influence overall performance since log likelihoods are significantly smaller than MSE loss. I tested decoding with entirely removed speaker change and assignment calculations and final prediction accuracy barely changed.

My background

Have I read the README.md file?

Have I searched for similar questions from closed issues?

Have I tried to find the answers in the paper Fully Supervised Speaker Diarization?

Have I tried to find the answers in the reference Speaker Diarization with LSTM?

Have I tried to find the answers in the reference Generalized End-to-End Loss for Speaker Verification?

AnzCol commented 5 years ago

Yes this could happen when your likelihood dominates. The model we design provide a generative process of speakers. And that is a crucial guarantee of the correctness of our model from a statistical perspective. Moreover, the transition bias and alpha come from your prior knowledge, and they can always change your speaker assignment when your prior is extreme: try set alpha=10^(-(10^10)) to see your prediction result.

You can somehow ‘learn’ alpha as well using empirical Bayesian methods.

在 2019年9月10日,上午11:43,Quan Wang notifications@github.com 写道:

Assigned #58 to @AnzCol.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

gen35 commented 5 years ago

Thanks for the clarification.

I see now, there is no simple solution to ensure that likelihood remains somewhat proportional.

I tested using extreme transition bias. 10^-20 moved bias of predicted sequences close to the one estimated from the ground truth. Although I am not sure how robust is this approach.

Closing this now, but I will further explore this issue.