lucidrains / BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs
MIT License
384 stars 13 forks source link

Hop length #11

Closed Psarpei closed 10 months ago

Psarpei commented 10 months ago

I am still a bit confused because in the code is the following comment in line 241 stft_hop_length = 512, # 10ms at 44100Hz, from sections 4.1, 4.4 in the paper - @faroit recommends // 2 or // 4 for better

It's true that they are mentioned to use a hop_size of 10ms for a 44.1kHz sample rate, but in my calculation 512 is not 10ms but ~11.6ms

Did I missunderstood something or what is the intention here ?

lucidrains commented 10 months ago

@Psarpei those hyperparameters got set through the discussion here

maybe they can weigh in?

faroit commented 10 months ago

@Psarpei 512 is just a lot more common. Also torch istft and stft is slow so I suggest to not make it slower with non power of 2 values.

Psarpei commented 10 months ago

Okay thanks for your answer :) but then there is the problem that the resulting audio length will not match the original one. What is your suggestion to handle that?

iver56 commented 10 months ago

Not all lengths are compatible. Check the length of the output you are getting and use that as the input length.

Psarpei commented 10 months ago

but in that way I loose information at the end of the audio, how do you avoid that using this approach ?

iver56 commented 10 months ago

If you use a compatible length, the length of the output will be the same as the length of the input. You are not losing anything this way. If you need to apply it to longer audio snippets, check out the part of the paper that describes overlap&average (OA) as one of several viable techniques.

Psarpei commented 10 months ago

okay thanks :)