zhr1201 / deep-clustering

A tensorflow implementation for Deep clustering: Discriminative embeddings for segmentation and separation
135 stars 70 forks source link

Preprocessing of Dataset to feed into BLSTM #21

Closed divyeshrajpura4114 closed 5 years ago

divyeshrajpura4114 commented 5 years ago

I have been trying to implement paper "Deep clustering: Discriminative embeddings for segmentation and separation", but I am not able to create batches because each audio file have different no of frames. I came across one sentence in experimental setup section that "To ensure the local coherency, the mixture speech was segmented with the length of 100 frames". What I understand is that authors are dividing each sample into 100 frames chunks and use each of this as input. Is that how do author handle variable length input to LSTM??

zhr1201 commented 5 years ago

Sorry for the delay. The originally paper do handle 100 frames data at a time(as input the the LSTM).

divyeshrajpura4114 commented 5 years ago

Thanks for reply...