mimbres / neural-audio-fp

https://mimbres.github.io/neural-audio-fp
MIT License
179 stars 25 forks source link

Model Definition Front Strides #42

Closed daniel-deychakiwsky closed 8 months ago

daniel-deychakiwsky commented 9 months ago

Hey team! I'm looking into the model definition here.

Should

 front_strides=[[(1,2), (2,1)], [(1,2), (2,1)],
                [(1,2), (2,1)], [(1,2), (2,1)],
                [(1,1), (2,1)], [(1,2), (2,1)],
                [(1,1), (2,1)], [(1,2), (2,1)]],

be

 front_strides=[[(1,2), (2,1)], [(1,2), (2,1)],
                [(1,2), (2,1)], [(1,2), (2,1)],
                [(1,2), (2,1)], [(1,2), (2,1)],
                [(1,2), (2,1)], [(1,2), (2,1)]],

Or is the former intended and, if so, why?

mimbres commented 8 months ago

@daniel-deychakiwsky I can't quite remember now, but there was likely no significant reason.

We reduced dimensions by using strides instead of pooling. Input shape was not square (T x F spectrogram, where T is not F), and continuously using [(1,2), (2,1)] would waste parameters.