reason behind kernel size 3x3 when input dimensions are 1x1

weiliu89 / caffe

Caffe: a fast open framework for deep learning.

http://caffe.berkeleyvision.org/

Other

4.77k stars 1.67k forks source link

reason behind kernel size 3x3 when input dimensions are 1x1 #410

Open sfraczek opened 7 years ago

sfraczek commented 7 years ago

Hi, We have found out that in some layers the kernel size is 3x3 while the input dimensions are 1x1. It works since padding is 1 but we wonder why have you not used kernel 1x1 and padding 0 instead?

weiliu89 commented 7 years ago

That maybe due to some oversight. But it may also be helpful because some of the default box on 1x1 might exceed the actual image region, and padding is helpful to deal with that.

sfraczek commented 7 years ago

Thank you for your response. I don't fully understand the second sentence. Could you please elaborate on that? The convolutions we are talking about are the ones encircled in blue: Those have kernel_size: 3, padding: 1 and the input to them is 1x1.

weiliu89 commented 7 years ago

For example, if the input image size is 300. If the scale of boxes on the last layer (i.e. conv9_2) is 0.9 (that is 270 pixels), then the 2:1 (or 1:2) box has the size of 270 * sqrt(2) x 270 / sqrt(2) (~ 381.8 x 190.9) which exceed the image region (300 x 300). So adding some padding can let the net know such effect.

sfraczek commented 7 years ago

Hey! Thank you for support. Unfortunately I cannot understand why padding would help? :worried:
To me it still seems that the result be the same either way. Isn't the information added by padding all zero anyway?