phonexiaresearch / VBx-training-recipe

Other
29 stars 11 forks source link

About fixed chunk size in egs preparig #8

Closed kjw11 closed 2 years ago

kjw11 commented 2 years ago

Hi!

Thanks a lot for your code!

I noticed that in this recipe, the "sid/nnet3/xvector/allocate_egs_but.py" script fixes the frames-per-chunk to be 400, while the original "sre16/sid/nnet3/xvector/allocate_egs.py" sets it as a random number between min-frames-per-chunk and max-frames-per-chunk.

So I'm just not sure why this recipe do it differently? Does it relate to how we split input utterances in diarization?

Thanks!

fnlandini commented 2 years ago

Hi, I do not think that such length was chosen specifically for diarization. It forcing the segments to be at least 400 frames long, might give more robustness but means that the lengths are considerably longer than our use case for diarization. Perhaps @Jamiroquai88 has more insight on why 400.

Jamiroquai88 commented 2 years ago

Hey, I believe that is mainly because ResNet needs a fixed-length input, which isn't the case for TDNNs. As far as I remember, 400 was slightly better than 200.