Closed vickianand closed 5 years ago
Hi @vickianand, please see my responses below:
The order of utterances are lost due to permutation and also due to collecting same-labeled segments from different parts of the utteraance.
The order of utterances is not important, and we should NOT learn anything from that. Each utterance is completely independent, containing a full conversation from multiple speakers. Multiple utterances are just multiple examples for training. And training should not depend on the order of data reading.
The order of entries within an utterance are also lost due to using sample_permuted_segments() function.
True! But partially.
The segment permutation is considered as a data augmentation step, that sacrifices some of the ordering information, but adds more variation to the training. If you call fit
on the same input twice, it will permute it (thus augment it) differently. This is important because diarization training sets are usually very small, since timestamped labels are expensive.
We admit that the permutation is not necessarily the best practice. It is what we found that works best for us. We didn't really explore all variations of the algorithm. If you find that there are better alternative solutions here, that would be some novel contributions. Feel free to share/publish that.
Thank you for for you quick response @wq2012. Thinking of permutations as a data augmentation step make sense if it helps improve performance of validation set. So, I'll try with and without it and see if it is better for my use-case. Thank you again!
From code what I understand is that
resize_sequence()
function is used to create a list of numpy-array, with observation vector from the same cluster in the same array. And optionally, it usessample_permuted_segments()
function to generatenum_permutations
number of permutations from each of those array. I think that by doing these we losing following two information about the data.sample_permuted_segments()
function.Someone please help me understand as to why we do these shuffling of sequences?
Thanks in advance!