Preprocessing of Dataset to feed into BLSTM

divyeshrajpura4114 commented 5 years ago

I have been trying to implement paper "Deep clustering: Discriminative embeddings for segmentation and separation", but I am not able to create batches because each audio file have different no of frames. I came across one sentence in experimental setup section that "To ensure the local coherency, the mixture speech was segmented with the length of 100 frames". What I understand is that authors are dividing each sample into 100 frames chunks and use each of this as input. Is that how do author handle variable length input to LSTM??

zhr1201 commented 5 years ago

Sorry for the delay. The originally paper do handle 100 frames data at a time(as input the the LSTM).

divyeshrajpura4114 commented 5 years ago

Thanks for reply...

zhr1201 / deep-clustering

Preprocessing of Dataset to feed into BLSTM #21