Experiment Results:
Model: Faster Rcnn (ResNet-50 backbone) without OHEM and Deformable Roi Pooling
Dataset:train with voc 07+12 test on voc 07
Deformable-V1 (with dcn in stage5):
mAP@0.5 | aeroplane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow |
---|---|---|---|---|---|---|---|---|---|---|
0.7836 | 0.8004 | 0.8071 | 0.7909 | 0.7092 | 0.6297 | 0.8582 | 0.8697 | 0.8951 | 0.6366 | 0.8516 |
diningtable | dog | horse | motorbike | person | pottedplant | sheep | sofa | train | tvmonitor |
---|---|---|---|---|---|---|---|---|---|
0.7121 | 0.8822 | 0.8837 | 0.8162 | 0.7965 | 0.5449 | 0.7787 | 0.7764 | 0.8725 | 0.7613 |
Deformable-V2 (with mdcn in stage5):
mAP@0.5 | aeroplane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow |
---|---|---|---|---|---|---|---|---|---|---|
0.7872 | 0.8025 | 0.8378 | 0.7808 | 0.7019 | 0.6241 | 0.8600 | 0.8650 | 0.8937 | 0.6351 | 0.8645 |
diningtable | dog | horse | motorbike | person | pottedplant | sheep | sofa | train | tvmonitor |
---|---|---|---|---|---|---|---|---|---|
0.7366 | 0.8848 | 0.8853 | 0.8268 | 0.7977 | 0.5161 | 0.7823 | 0.7799 | 0.8785 | 0.7911 |
move deformable_conv_layer.cpp and deformable_conv_layer.cu to yourcaffepath/src/caffe/layers/
move modulated_deformable_conv_layer.cpp and modulated_deformable_conv_layer.cu to yourcaffepath/src/caffe/layers/
move deformable_conv_layer.hpp and modulated_deformable_conv_layer.hpp to yourcaffepath/include/caffe/layers/
move deformable_im2col.hpp and modulated_deformable_im2col.hpp to yourcaffepath/include/caffe/util/
move deformable_im2col.cu and modulated_deformable_im2col.cu to yourcaffepath/src/caffe/util/
edit caffe.proto:
optional DeformableConvolutionParameter deformable_convolution_param = 999999;
optional ModulatedDeformableConvolutionParameter modulated_deformable_convolution_param = 9999999;
message DeformableConvolutionParameter {
optional uint32 num_output = 1;
optional bool bias_term = 2 [default = true];
repeated uint32 pad = 3; // The padding size; defaults to 0
repeated uint32 kernel_size = 4; // The kernel size
repeated uint32 stride = 6; // The stride; defaults to 1
repeated uint32 dilation = 18; // The dilation; defaults to 1
optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
optional uint32 kernel_h = 11; // The kernel height (2D only)
optional uint32 kernel_w = 12; // The kernel width (2D only)
optional uint32 stride_h = 13; // The stride height (2D only)
optional uint32 stride_w = 14; // The stride width (2D only)
optional uint32 group = 5 [default = 1];
optional uint32 deformable_group = 25 [default = 1];
optional FillerParameter weight_filler = 7; // The filler for the weight
optional FillerParameter bias_filler = 8; // The filler for the bias
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 15 [default = DEFAULT];
optional int32 axis = 16 [default = 1];
optional bool force_nd_im2col = 17 [default = false];
}
message ModulatedDeformableConvolutionParameter {
optional uint32 num_output = 1;
optional bool bias_term = 2 [default = true];
repeated uint32 pad = 3; // The padding size; defaults to 0
repeated uint32 kernel_size = 4; // The kernel size
repeated uint32 stride = 6; // The stride; defaults to 1
repeated uint32 dilation = 18; // The dilation; defaults to 1
optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
optional uint32 kernel_h = 11; // The kernel height (2D only)
optional uint32 kernel_w = 12; // The kernel width (2D only)
optional uint32 stride_h = 13; // The stride height (2D only)
optional uint32 stride_w = 14; // The stride width (2D only)
optional uint32 group = 5 [default = 1];
optional uint32 deformable_group = 25 [default = 1];
optional FillerParameter weight_filler = 7; // The filler for the weight
optional FillerParameter bias_filler = 8; // The filler for the bias
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 15 [default = DEFAULT];
optional int32 axis = 16 [default = 1];
optional bool force_nd_im2col = 17 [default = false];
}
Deformable_ConvNet_V1 in ResNet:
Deformable_ConvNet_V2 in Resnet:
Thanks to offical mxnet code
Thanks to unsky