ijkguo / mx-rcnn

Parallel Faster R-CNN implementation with MXNet.
Other
669 stars 292 forks source link

How to use SyncBatchNorm? #107

Closed ACkuku closed 5 years ago

ACkuku commented 5 years ago

I changed the symbol_resnet.py BatchNorm to SyncBatchnorm, for example:

data_bn = mx.sym.BatchNorm(data=data, fix_gamma=True, eps=eps, 
                               use_global_stats=use_global_stats, name='bn_data')
# to changed:
data_bn = mx.sym.contrib.SyncBatchNorm(data=data, fix_gamma=True, eps=eps, 
                           use_global_stats=use_global_stats, ndev=2, name='bn_sync_data')

but, when I training VOC dataset, I got the error:

mxnet.base.MXNetError: [10:36:19] /home/travis/build/dmlc/mxnet-distro/mxnet-build/3rdparty
/mshadow/mshadow/./tensor_gpu-inl.h:62:
 Check failed: _dst.shape_ == _src.shape_ ((3,) vs. (64,)) Copy:shape mismatch`

Do you know why? Thank you~

ijkguo commented 5 years ago

Also supply key (a name of your choice), e.g. https://github.com/apache/incubator-mxnet/blob/3ae43316549bc9f7488028475861729aeaf56df1/python/mxnet/gluon/contrib/nn/basic_layers.py#L225

ACkuku commented 5 years ago

Thanks for your reply~ I trained the resnet-50, the VOD dataset, and I got the results: no syncBN, and the Mean AP = 0.6242, with syncBN, and the Mean AP = 0.5654 all of BN layers changed to SyncBN, and fix_gamma=False, use_global_stats=False, ndev=2,
and fixed the layer params:

 'conv0', 
'stage1_unit1_conv1',
'stage1_unit1_conv2', 
'stage1_unit1_conv3' ,
'stage1_unit1_sc', 
'stage1_unit2_conv1', 
'stage1_unit2_conv2', 
'stage1_unit2_conv3',
'stage1_unit3_conv1', 
'stage1_unit3_conv2', 
'stage1_unit3_conv3'

I don't know why the AP drop off ? Do you have any idea about this? Thank you~

ijkguo commented 5 years ago

Effective batch size = 2 is still a small batch size. Maybe try set rcnn-batch-size to be more than 1.