zhreshold / mxnet-ssd

MXNet port of SSD: Single Shot MultiBox Object Detector. Reimplementation of https://github.com/weiliu89/caffe/tree/ssd
MIT License
763 stars 336 forks source link

problem of multi_layer_feature map output shape(mobilenet ssd) #221

Open jacky4323 opened 6 years ago

jacky4323 commented 6 years ago

Hi,

I'm confuced about the last layers of the feature map with stride 2 and kernel size =3,and I also visualize the network with the infer_shape showed as below,How can 1X1 feature map can convolve with stride 2 and kernel size =3 ? And also the output shape is also wierd.

Thanks!!

elif network == 'mobilenet': from_layers = ['conv_12_relu', 'conv_14_relu', '', '', '', '', ''] num_filters = [-1, -1, 512, 256, 256, 256, 256] strides = [-1, -1, 2, 2, 2, 2, 2] pads = [-1, -1, 1, 1, 1, 1, 1] sizes = get_scales(min_scale=0.15, max_scale=0.9, num_layers=len(from_layers)) ratios = [[1,2,.5], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], \ [1,2,.5,3,1./3], [1,2,.5], [1,2,.5]] normalizations = -1 steps = [] return locals()

image

zhreshold commented 6 years ago

are you using large enough data shape, say 512?

jacky4323 commented 6 years ago

Hi, yes, I use 512, in the last two extra layers is in the same scale(1X1 feature map) I think 1X1 feature map can convolve with stride 2 and kernel size =3 just padding 0's around feature map. However, if the scale of the feature map is the same, the anchor's map to the original image will also be the same(or similiar due to the different size,same ratio in your symbol_factory.py). I also visualize data shape 608 ,and it is more reasonable that the feature map size is different(2X2 and 1X1)

sizes = get_scales(min_scale=0.15, max_scale=0.9, num_layers=len(from_layers)) ratios = [[1,2,.5], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5], [1,2,.5]]

thanks!!

data shape = 512 image

data shape = 608

image