Closed zyc1310517843 closed 5 years ago
I really don't understand your questions. Please clarify.
We can't understand the relationship between real tags and predictive tags.
Which part you don't understand?
it makes it impossible to find out who the speaker is.
What do you mean?
For example, I used 46 people to train the model, where train_cluster_id is [0,0,0............... 45,45,45], and then I used Forty-sixth people to predict, where test_cluster_id is [0,0,0,0,0...]. The predicted result is [0, 0, 0, 0, 0...]. My question is, shouldn't the predicted label be [45, 45, 45...]? I hope you can understand what I said.
In diarization, the labels are not absolute labels, but relative labels. It is identity-agnostic.
Labels are meaningless across utterances.
For example, in an utterance, the labels are [0, 0, 1], it means first two segments are from one speaker, while the last segment is from a different speaker. It does NOT refer to any specific speaker.
if another utterance has labels [0, 1, 1], the two speakers in this utterance has no connection with the speakers in the previous utterance.
I understand exactly what you said. Can I get the absolute label? Because I want to know who the speaker is.Thank you。
I understand exactly what you said. Can I get the absolute label? Because I want to know who the speaker is.Thank you。
---Original--- From: "Quan Wang"notifications@github.com Date: Mon, Jun 10, 2019 11:39 AM To: "google/uis-rnn"uis-rnn@noreply.github.com; Cc: "Author"author@noreply.github.com;"zyc1310517843"1310517843@qq.com; Subject: Re: [google/uis-rnn] Understanding diarization labels (#51)
In diarization, the labels are not absolute labels, but relative labels. It is identity-agnostic.
Labels are meaningless across utterances.
For example, in an utterance, the labels are [0, 0, 1], it means first two segments are from one speaker, while the last segment is from a different speaker. It's does refer to any specific speaker.
if another utterance has labels [0, 1, 1], the two speakers in this utterance has no connection with the speakers in the previous utterance.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
If you want the absolute labels, you are looking at the wrong technique and the wrong repo. It's not the problem diarization is trying to solve. It's speaker recognition, which is much easier than diarization. You can simply compute cosine similarity with different embeddings.
Describe the question
A clear and concise description of what the question is.
My background
Have I read the
README.md
file?Have I searched for similar questions from closed issues?
Have I tried to find the answers in the paper Fully Supervised Speaker Diarization?
Have I tried to find the answers in the reference Speaker Diarization with LSTM?
Have I tried to find the answers in the reference Generalized End-to-End Loss for Speaker Verification?
Hello, we used third-party tools to generate train_sequence and train_cluster_id, and completed the training. We trained 46 people and tested one of them. The prediction accuracy of the model was 98%. We can't understand the relationship between real tags and predictive tags. Although the accuracy is high, it makes it impossible to find out who the speaker is. We don't understand the label of demo you gave us. Thank you for your guidance.