Default box scale of DSSD & the effect of batch size

youngwanLEE commented 7 years ago

Hi, Wei.

I have implemented your DSSD on VOC and ILSVRC16.

I implemented PM(Prediction Module) + DM(Deconvolution Module) based on SSD-VGG-512 model and achieve mAP 80.87 on VOC07test (trained on 07+12 from scratch).

However, when implementing the original ResNet101-based SSD, the accuracy is 77 mAP.

My settings are belows :

added 1.6 default box aspect ratio setting
used conv3_x, conv5_x featureMap referred to your code & your paper's description .
used same learning rate policy( init_learning rate : 0.001 and multi-step)
used same default box scale in your model(SSD-300, SSD-512 models).
batch size : SSD-300 : 16/32( 4 per a device ) using 4GPUs(TITAN X) --> mAP : 75.74 SSD-512 : 8/24 ( 2 per a device) using 4GPUs(TITAN X) --> mAP : 77.44

for Res101-SSD-321

...
# in percent %
min_ratio = 20
max_ratio = 90
step = int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))
min_sizes = []
max_sizes = []
for ratio in xrange(min_ratio, max_ratio + 1, step):
  min_sizes.append(min_dim * ratio / 100.)
  max_sizes.append(min_dim * (ratio + step) / 100.)
min_sizes = [min_dim * 10 / 100.] + min_sizes
max_sizes = [min_dim * 20 / 100.] + max_sizes
steps = [10, 20, 40, 80, 160, 321]
aspect_ratios = [[2], [1.6, 2, 3], [1.6,2, 3], [1.6, 2, 3], [2], [2]]
# L2 normalize conv4_3.
normalizations = [20, -1, -1, -1, -1, -1]
# variance used to encode/decode prior bboxes.
if code_type == P.PriorBox.CENTER_SIZE:
  prior_variance = [0.1, 0.1, 0.2, 0.2]
else:
  prior_variance = [0.1]
flip = True
clip = False
...

for Res101-SSD-512

...
# in percent %
min_ratio = 15
max_ratio = 90
step = int(math.floor((max_ratio - min_ratio) / (len(mbox_source_layers) - 2)))
min_sizes = []
max_sizes = []
for ratio in xrange(min_ratio, max_ratio + 1, step):
  min_sizes.append(min_dim * ratio / 100.)
  max_sizes.append(min_dim * (ratio + step) / 100.)
min_sizes = [min_dim * 7 / 100.] + min_sizes
max_sizes = [min_dim * 15 / 100.] + max_sizes
steps = [8, 16, 32, 64, 128, 256, 512]
aspect_ratios = [[1.6, 2], [1.6, 2, 3], [1.6, 2, 3], [1.6, 2, 3], [1.6, 2, 3], [2], [2]] # DSSD
# L2 normalize conv4_3.
normalizations = [-1,-1,-1, -1, -1, -1, -1]
# variance used to encode/decode prior bboxes.
if code_type == P.PriorBox.CENTER_SIZE:
  prior_variance = [0.1, 0.1, 0.2, 0.2]
else:
  prior_variance = [0.1]
flip = True
clip = False
...

I tried to experiment many trials. But the result is as not good as your results on your paper.

I guesses two things that 1) the default box scale setting. 2) the effect of batch size on one device.

In your DSSD paper's page 7, you mentioned that

"According to our observation, a batch size smaller than 16 and trained on 4 GPUs can cause unstable results in batch normalization and hurt accuracy."

This means that if I use a batch size smaller than 4 on one device, accuracy will get hurt due to unstable batch normalization?

My question are :

1) How to set default box scale for ResNet101 for DSSD ? -- are bigger than those of VGG or smaller ? 2) The effect of batch size on a device.

thanks in advance.

KeyKy commented 7 years ago

@youngwanLEE I found your WR-Inception+SSD. Does it better than VGG16 in human detection? Thanks.

youngwanLEE commented 7 years ago

@KeyKy yes, of course, various strategies for KITTI are needed.

KeyKy commented 7 years ago

@youngwanLEE Sorry to bother you again. Where did you get the ImageNet pretrained model of WR-Inception?

youngwanLEE commented 7 years ago

@KeyKy I directly trained pretrained model.

roipony commented 7 years ago

@Keyky can you share DSSD code? Thanks

KeyKy commented 7 years ago

@roipony I only talk about SSD here.

roipony commented 7 years ago

@youngwanLEE can you share DSSD code? Thanks

CrazySssst commented 7 years ago

@youngwanLEE Do you implement ResNet+SSD successfully?

isalirezag commented 6 years ago

can you please explain to me what are the meaning of min_ratio and step and aspect_ratios

weiliu89 / caffe

Default box scale of DSSD & the effect of batch size #624