keunwoochoi / music-auto_tagging-keras

Music auto-tagging models and trained weights in keras/theano
MIT License
616 stars 142 forks source link

FCN-4 Convolve problem #43

Closed ghost closed 5 years ago

ghost commented 5 years ago

hello my friends ,i am beginner and i have problem with convolve operation in FCN-4 in AUTOMATIC TAGGING USING DEEP CONVOLUTIONAL NEURAL NETWORKS (table1 page 3 - image ) i calculate output like below :

conv (3x3xkernel_size) reduce the input dimention 2 unit. for example (250x250x3) with conv(3x3x3) output is ( (250-2=248) x (250-2=248) x 3)

i attach my caculation file below, image

please help, thanks alot

keunwoochoi commented 5 years ago

Hi, it assumes we're using 'same' convolution, which means it is padded before the conv operation so that the size of its output would be equal to that of its input if there's no downsampling (stride). Check out the Keras conv2d keyword args etc.

ghost commented 5 years ago

hello Dr Choi, thank you for your reply okey 48x341x128 after convolve with 3x3x384 the output is 48x341x384 but what ablout MP? for example MP(4,5) in row 4, input >> (48 x 341 x 384) MP >> ( 4 , 5) i think output is ( 48/4=12 x 341/5=68.2 x 384) but in image (24 x 85 x 384)

keunwoochoi commented 5 years ago

In that case, I think Keras discards the non-integer parts. Please check out the model.summary()!

ghost commented 5 years ago

yes you right keras floor numbers and convert them to integer. I try model.summary(). i open music_tagger_cnn.py file delete conv block 5 and change four conv block parameters to FCN4 parameters. make object from class and call summary. model = MusicTaggerCNN() model.summary() result is:

> (venv) usr@ubuntu:~/$MY_PATH/music-auto_tagging-keras-master$ python fcn4.py 
> Using Theano backend.
> Pool{ds=(2, 4), ignore_border=True, st=(2, 4), padding=(0, 0), mode='max'}.0
> ____________________________________________________________________________________________________                     
> Layer (type)                     Output Shape          Param #     Connected to
>= = = = = = = = = =
> input_1 (InputLayer)             (None, 1, 96, 1366)   0                                            
> bn_0_freq (BatchNormalization)   (None, 1, 96, 1366)   2732        input_1[0][0]                    
> conv1 (Convolution2D)            (None, 128, 96, 1366) 320         bn_0_freq[0][0]                  
> bn1 (BatchNormalization)         (None, 128, 96, 1366) 64          conv1[0][0]                      
> elu_1 (ELU)                      (None, 128, 96, 1366) 0           bn1[0][0]                        
> pool1 (MaxPooling2D)             (None, 128, 48, 341)  0           elu_1[0][0]                      
> conv2 (Convolution2D)            (None, 384, 48, 341)  36992       pool1[0][0]                      
> bn2 (BatchNormalization)         (None, 384, 48, 341)  256         conv2[0][0]                      
> elu_2 (ELU)                      (None, 384, 48, 341)  0           bn2[0][0]                        
> pool2 (MaxPooling2D)             (None, 384, 12, 68)   0           elu_2[0][0]                      
> conv3 (Convolution2D)            (None, 768, 12, 68)   147584      pool2[0][0]                      
> bn3 (BatchNormalization)         (None, 768, 12, 68)   256         conv3[0][0]                      
> elu_3 (ELU)                      (None, 768, 12, 68)   0           bn3[0][0]                        
> pool3 (MaxPooling2D)             (None, 768, 4, 8)     0           elu_3[0][0]                      
> conv4 (Convolution2D)            (None, 2048, 4, 8)    221376      pool3[0][0]                      
> bn4 (BatchNormalization)         (None, 2048, 4, 8)    384         conv4[0][0]                      
> elu_4 (ELU)                      (None, 2048, 4, 8)    0           bn4[0][0]                        
> pool4 (MaxPooling2D)             (None, 2048, 1, 1)    0           elu_4[0][0]                      
> flatten_1 (Flatten)              (None, 2048)          0           pool4[0][0]                      
>output (Dense)                    (None, 50)            12850       flatten_1[0][0]                  
>= = = = = = = = = = 
> Total params: 422814

this is like my calculation : image

ghost commented 5 years ago

i dont have same problem with table 2 (FCN-5 , 6 and 7).

keunwoochoi commented 5 years ago

Hi, sorry but I don't have time to follow this up in detail. But the problem is about convnet in general, I think you can look at the shape of the weights/bias of each layer by looking deeply into Keras layers.

ghost commented 5 years ago

Thank you a lot for your time. It helped me very much.