Why is it named single stage?

@BB88Lee This is widely discussed in many papers such as AlignDet [1], Guided Anchor [2], DAFS [3]. In my view, the different lies between whether appling NMS (which is the most time-summing part) in the proposal generation module. Furhter more, the RoI pooling (which requires pixel bining) has more complixity than feature warping/sampling/adaptation (e.g. applying deconv to adapt feature map based on guided anchor). The boundary between the two-stage and one-stage are becomming more and more ambiguous.

Chen, Yuntao, Chenxia Han, Naiyan Wang, and Zhaoxiang Zhang. "Revisiting feature alignment for one-stage object detection." arXiv preprint arXiv:1908.01570 (2019).
Wang, Jiaqi, Kai Chen, Shuo Yang, Chen Change Loy, and Dahua Lin. "Region proposal by guided anchoring." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2965-2974. 2019.
Li, Shuai, Lingxiao Yang, Jianqiang Huang, Xian-Sheng Hua, and Lei Zhang. "Dynamic anchor feature selection for single-shot object detection." In Proceedings of the IEEE International Conference on Computer Vision, pp. 6609-6618. 2019.

skyhehe123 / SA-SSD

Why is it named single stage? #8