josebeo2016 / biosegment

The supporting project for BTS-E
0 stars 0 forks source link

Length of Encoding Vector #3

Closed HildaNya closed 1 month ago

HildaNya commented 1 month ago

image Hi! I'm trying to tie the codebase back to the paper and am a little confused by the length of the segmentation encoding vector. According to Fig 1. from your paper (attached here), the segmentation encoding vector is of length n, where n is the number of frames. Does this mean that n (and the length of the encoding vector) would be different, depending on the duration of audio samples?

When I was trying to reproduce your results, tsme's fit transform requires vectors of the same size. So I just chopped off longer vectors to make sure they are all the same size. Which works.

But in your demo ipynb file, there doesn't seem to be any chopping performed on the encoding vectors. So perhaps there's a better solution that I may have missed?

Thank you in advance! Great work!

josebeo2016 commented 1 month ago

Hello @HildaNya In my work I cut the audio input into 64600 samples, which is same to the RawNet 2 baseline. You can refer to this code in the other repo. For tsme, I visualize the vector of segmentation encoding vectors after chopping, so that they are all same length.