wq2012 / SpectralCluster

Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.
https://google.github.io/speaker-id/publications/LstmDiarization/
Apache License 2.0
513 stars 73 forks source link

Embedding aggregation by segment #15

Closed kareemamrr closed 4 years ago

kareemamrr commented 4 years ago

The authors of the paper state that after extracting all embeddings they aggregate them by segment, which is of maximum size 400ms post VAD processing. Also, a single embedding is representative of 240ms of the original signal overlapping by 120ms. So two full embeddings would be representative of 360ms of the original signal, whatabout the remaining 40ms? In this issue one of the author states that a segment has about 4 windows but I couldn't understand how that is achieved.

wq2012 commented 4 years ago

A segment does not have to cover the entire window. As long as more than 50% of the window falls into the segment, you can count it in.

But, this is just our practice that worked well for our own experiment setup. It's not necessary the best practice for your problem.