Closed OpenCVnoob closed 5 years ago
Hi,
You should NOT use frame-level embeddings. You should use segment-level embeddings, and the corresponding segment-level speaker labels.
Thanks for your reply! I got it
@OpenCVnoob hello ,I meet the same problem as you,have you solved it out? Can you tell me how to del this issue?
oh, sorry I didn't notice this until now. I am still trying to find a good way to segment audio into single-speaker-segmentation, besides,there is no suitable dataset available for me. So I 'm not sure when will I solve this issue. 18210240147 邮箱18210240147@fudan.edu.cn 签名由 网易邮箱大师 定制 On 02/28/2019 16:28, Aurora11111 wrote: @OpenCVnoob hello ,I meet the same problem as you,have you solved it out? Can you tell me how to del this issue? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@OpenCVnoob I run the project with myown datasets, the print out result is bad.
Describe the question
Hi, thank you for open source it ! I have read the 'README.md' file and almost all the issues under this repo. But I 'm still in a puzzle about data pre-processing.
My understanding is that before the training of the UIS-RNN, a speaker embedding network should be trained with some single-speaker utterance-level features , as is mentioned in the paper of GE2E loss, in advance. After that , input frame-level features generated from raw data to the embedding network to generate frame-level embeddings. And then I can use them to train my UIS-RNN. Am I right about that? I 'm wondering whether these frame-level embeddings are 'continuous d-vector embeddings (as sequences) ' you said here.
I am a new comer of speaker diarization and the question I asked really confused me, so I 'd be very grateful if you can help me. Thanks :)
My background
Have I read the
README.md
file?Have I searched for similar questions from closed issues?
Have I tried to find the answers in the paper Fully Supervised Speaker Diarization?
Have I tried to find the answers in the reference Speaker Diarization with LSTM?
Have I tried to find the answers in the reference Generalized End-to-End Loss for Speaker Verification?