[79] FCOS: Fully Convolutional One-Stage Object Detection

TL;DR

task : anchor-free object detection
problem : anchor 기반의 object detection들은 1) 하이퍼 파라미터에 민감하고 2) (상대적인 regression을 하긴 하지만) anchor의 scale / aspect ratio가 고정되어 있고 3) anchor box들이 image 내에 dense하게 있으며(단축이 800 정도인 이미지에 180K의 anchor box들) 4) GT box와 matching 하는 부분에 IoU가 들어가서 계산이 복잡해진다.
idea : semantic segmentation 처럼 fully convolution network로 pixel 별로 object detection을 해보자
architecture : CNN backbone(ResNet-50)의 C3, C4, C5에 1 x 1 conv한 P3, P4, P5, 그리고 P5에 stride 2 conv를 한 P6, P7로 feature pyramid를 만든다. 각 픽셀로 예측을 할 때 object들이 너무 겹쳐있으면 어떤 box를 예측해야하는지 애매하기 때문에 center-ness를 head를 따로 두어서 0~1 sigmoid로 학습한다.
objective : focal loss for cls, IoU loss for bbox regression
baseline : Faster R-CNN, YOLOv2, SSD, DSSD, RetinaNet, CornerNet
data : COCO
result : SOTA!
contribution : anchor box를 꼭 써야하는건가?하는 의문을 제기하고 멋진 성능으로 해결 ㅍㅑㅍㅑ
limitation or 이해 안되는 부분 : BPR(upper bound of recall rate that a detector can achieve)는 어떻게 측정되는가?