msracver / Relation-Networks-for-Object-Detection

Relation Networks for Object Detection
MIT License
1.09k stars 190 forks source link

FPN baseline #27

Closed FishYuLi closed 5 years ago

FishYuLi commented 5 years ago

Hi! This is a great work! But I wonder why the FPN baseline (2FC + softnms(0.6) ResNet-101) 36.8 mAP is lower than the Detectron FPN baseline (R-101-FPN) 38.5 mAP ? You both use ResNet101 and pre-computed proposals. Is there anything different in implementation?

chengdazhi commented 5 years ago

A difference is that detectron use RoIAlign, where we use RoIPool.

FishYuLi commented 5 years ago

@chengdazhi Yes, I notice that. But I'm not sure that if RoIAlign can make a difference of nearly 1.7 points. The paper of Mask RCNN shows that this may lead to an improvement of about 1 point, which makes me a little confused. Thanks a lot. I may try RoIAlign later.

chengdazhi commented 5 years ago

Another possibility is that we train FPN in a two stage manner, this speeds up training, but can harm accuracy.

FishYuLi commented 5 years ago

@chengdazhi 38.5 is also the result of a two-stage manner with pre-computed proposals and 1x schedule. They got 39.4 with end2end training.

chengdazhi commented 5 years ago

different pretrained resnet models could be another possible cause

stupidZZ commented 5 years ago

@FishYuLi Some other known differences are: 1) The detectron changes the learning rate by 2 times, there has 0.2~0.4 mAP improvement. 2) The implementation of weight-decay in MXNET and Caffe2 has a little difference, this may lead 0.1~0.3 mAP gaps compared with the best results, depending on how many GPUs do you use. 3) We closed the warmup. 4) The detectron adopts Xariver init, we use random Gaussian. 5) We use softnms=0.6 and the detectron use NMS=0.5. For plain ROI, the softnms will have 0.4~0.5 mAP gain than NMS. But for ROI-Align, there is only 0.2 mAP gain.

FishYuLi commented 5 years ago

@stupidZZ Sounds more reasonable. Thanks! : )