Closed kjw11 closed 2 years ago
Hi, I do not think that such length was chosen specifically for diarization. It forcing the segments to be at least 400 frames long, might give more robustness but means that the lengths are considerably longer than our use case for diarization. Perhaps @Jamiroquai88 has more insight on why 400.
Hey, I believe that is mainly because ResNet needs a fixed-length input, which isn't the case for TDNNs. As far as I remember, 400 was slightly better than 200.
Hi!
Thanks a lot for your code!
I noticed that in this recipe, the "sid/nnet3/xvector/allocate_egs_but.py" script fixes the frames-per-chunk to be 400, while the original "sre16/sid/nnet3/xvector/allocate_egs.py" sets it as a random number between min-frames-per-chunk and max-frames-per-chunk.
So I'm just not sure why this recipe do it differently? Does it relate to how we split input utterances in diarization?
Thanks!