taylorlu / Speaker-Diarization

speaker diarization by uis-rnn and speaker embedding by vgg-speaker-recognition
Apache License 2.0
464 stars 121 forks source link

Innacurate start and till time of slices attained #47

Open Gaurav470 opened 3 years ago

Gaurav470 commented 3 years ago

Hi @taylorlu

Thanks for your awesome integration on the UIS-RNN code. We have made a custom UIS-RNN model based on our data and getting somewhat decent accuracy but the slice timings when the speaker changes are slightly off by some margin in majority of the cases. (either the start time starts a bit early even when the particular speaker has not started speaking or the till time ends early when the particular speaker is still speaking) Can you please throw some light on what can be the issues for the cause. Thanks in advance.

taylorlu commented 3 years ago

The timings accurary also depends on the speaker feature, perhaps you need to find a more robust speaker feature extractor which can anti-noise, and also, change the sliding window size or hopsize may result in a different segmentation.

Gaurav470 commented 3 years ago

The timings accurary also depends on the speaker feature, perhaps you need to find a more robust speaker feature extractor which can anti-noise, and also, change the sliding window size or hopsize may result in a different segmentation.

Thanks @taylorlu for the prompt reply. Can you please suggest how we can make our speaker feature extractor more prone to the issues or can suggest some other speaker feature extractors for the same too. Thanks a lot.