Closed gen35 closed 5 years ago
Yes this could happen when your likelihood dominates. The model we design provide a generative process of speakers. And that is a crucial guarantee of the correctness of our model from a statistical perspective. Moreover, the transition bias and alpha come from your prior knowledge, and they can always change your speaker assignment when your prior is extreme: try set alpha=10^(-(10^10)) to see your prediction result.
You can somehow ‘learn’ alpha as well using empirical Bayesian methods.
在 2019年9月10日,上午11:43,Quan Wang notifications@github.com 写道:
Assigned #58 to @AnzCol.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.
Thanks for the clarification.
I see now, there is no simple solution to ensure that likelihood remains somewhat proportional.
I tested using extreme transition bias. 10^-20 moved bias of predicted sequences close to the one estimated from the ground truth. Although I am not sure how robust is this approach.
Closing this now, but I will further explore this issue.
Describe the question
When loss (or negative log likelihood) is calculated in '_update_beam_state', speaker change and assignment log likelihoods are subtracted from MSE loss. I fail to comprehend why MSE loss is treated as log likelihood of sequence generation.
I think both transition bias and CRP alpha estimations hardly influence overall performance since log likelihoods are significantly smaller than MSE loss. I tested decoding with entirely removed speaker change and assignment calculations and final prediction accuracy barely changed.
My background
Have I read the
README.md
file?Have I searched for similar questions from closed issues?
Have I tried to find the answers in the paper Fully Supervised Speaker Diarization?
Have I tried to find the answers in the reference Speaker Diarization with LSTM?
Have I tried to find the answers in the reference Generalized End-to-End Loss for Speaker Verification?