Why RefineDet is classified to one-stage framework?

sfzhang15 / RefineDet

Single-Shot Refinement Neural Network for Object Detection, CVPR, 2018

Other

1.43k stars 393 forks source link

Why RefineDet is classified to one-stage framework? #171

Closed ZhiqiJiang closed 5 years ago

ZhiqiJiang commented 5 years ago

@sfzhang15 Why RefineDet is classified to one-stage framework? In my opinion, it's just two-stage framework. The RefineDet ODM takes feature maps of network-input image with refined anchors as input, while faster-rcnn takes feature map, croped from the feature map of network-input image according to the refined anchors in first stage, as input. Therefore, RefineDet doesn't need ROI-Pooling layer. I'm not sure that my understanding is right and if not, it's appreciated that you present the error.

sfzhang15 commented 5 years ago

@0801130205 The two-stage detectors use the second stage to classify and refine each candidate box via a region-wise subnetwork (i.e., another “shot”), which is effective but time-consuming. The one-stage detectors (e.g., DSSD [13], RetinaNet [28], RON [24]) do not have the second stage so they can run fast. In our opinion, the key aspect to distinguish the one-stage and two-stage detector is whether it has the region-wise operation. Based on the single-shot framework, our RefineDet achieves the two-step classification and regression without the time-consuming region-wise operation (e.g., RoIPooling). This is the main reason that RefineDet can achieve state-of-the-art performance with real-time speed.

ZhiqiJiang commented 5 years ago

@sfzhang15 Thank you for your reply. I get your meaning that is reasonable. For faster rcnn, the fully-connected layers are also time-consuming. Is Region-wise subnetwork meaning that its input is regions' feature maps(i.e., output of RoIPooling layer), cropped from the whole image's feature map according to the refined anchors, not whole image's feature map with regions' location(regions are namely refined anchors)?

sfzhang15 commented 5 years ago

@0801130205 Region-wise subnetwork means that the RoIPooling layer needs to process each region separately.

ZhiqiJiang commented 5 years ago

@sfzhang15 Screenshot from 2019-04-28 10-07-35 The image is cropped from faster_rcnn_test.pt. I maybe get your meaning. But I'm still confused about ROIPooling layer's implementation details(ROIPooling layer is named roi_pool5 in image above) that can expose the key difference between RefineDet and two-stage methods.

ZhiqiJiang commented 5 years ago

@sfzhang15 It is so kind of you to spare your time to reply.

ZhiqiJiang commented 5 years ago

@sfzhang15 I have read the source code of faster rcnn except for the ROIPooling. So I'm still not sure for the detail of ROIPooling layer.

ZhiqiJiang commented 5 years ago

I will read the source code of the ROIPooling layer of faster rcnn on my own, so I determine to close this issue.

sfzhang15 commented 5 years ago

@0801130205 In the ROIPooling layer, there is a for loop for each roi, which computes the locations of each roi on the feature map, then crops out the corresponding features.

ZhiqiJiang commented 5 years ago

@sfzhang15 Thank you so much. Now I'm clear about the key difference between faster rcnn and RefineDet.