mimbres / neural-audio-fp

https://mimbres.github.io/neural-audio-fp
MIT License
175 stars 25 forks source link

Code and data set availability #1

Closed oldmonkABA closed 3 years ago

oldmonkABA commented 3 years ago

Hi

Could you give a date on which the code and data set will be available.

Regards

mkudinov commented 3 years ago

I tried to replicate your results but I couldn't even implement your network. If I got the paper right the input has shape [256 x 32]. Then If we do 8 convolutions with kernel [3 x 3] and output sizes as reported in Tab.1 we get: [1 x 256 x 32] -> [64 x 128 x 16] -> [64 x 128 x 8] -> .... [1024 x 2 x 1] -> [1024 x 1 x 1]. In the last case Layer Normalization won't work as the layer size is [1 x 1]. Mybe I just don't get the model right but the paper is unclear. Please, share the details if you can't share the code

mimbres commented 3 years ago

@mkudinov TL;DR: First, you're right. But the result will not change.

Fortunately, the exact implementation according to the paper works equally well. Let me clarify 2 mistakes.

# from model.nnfp.py
class Convlayers():
...
self.forward = Sequential([
  conv2d_1x3,
  ELU(),
  BN_1x3,   # Set BN as LayerNorm2d
  conv2d_3x1,
  ELU(),
  BN_3x1])  # Set BN as LayerNorm2d
...
class FingerPrinter():
...
# settings for Convlayers  
input_shape=(256,32,1)
front_hidden_ch=[128, 128, 256, 256, 512, 512, 1024, 1024]
front_strides=[[(1,2), (2,1)], [(1,2), (2,1)],  # --> (128, 16, 128) --> (64, 8, 128)
               [(1,2), (2,1)], [(1,2), (2,1)], # --> (32, 8, 256) --> (16, 4, 256)
               [(1,1), (2,1)], [(1,2), (2,1)], # --> (8, 4, 512) --> (4, 2, 512)
               [(1,1), (2,1)], [(1,2), (2,1)]  # --> (2, 1, 1024) --> (1, 1, 1024)

1) In the paper, I described every strides as [(1,2),(2,1)] for simplicity (or laziness). But it should be corrected as in my code. The result would not change. The last stride parameters look still a bit weird, but it can work with the input shape (256, 63, 1) of 2 seconds without modification. 2) So we repeat the Conv layers 8 times. As you commented, LN in the 8th Convlayer is ignorable.

mimbres commented 3 years ago

@oldmonkABA Hi, please read the news.

oldmonkABA commented 3 years ago

@mimbres Will wait for the full code. Meanwhile will check out what you have uploaded