Question faster RCNN anchor and sliding windows

smallcorgi / Faster-RCNN_TF

Faster-RCNN in Tensorflow

MIT License

2.34k stars 1.12k forks source link

Question faster RCNN anchor and sliding windows #285

Open Shiro-LK opened 6 years ago

Shiro-LK commented 6 years ago

Hello, I have some misunderstanding regarding faster RCNN and the sliding windows. In fact, the sliding window (3x3x512 kernel for VGG16) is applied on the feature map, but I do not understand how to get the anchor boxes. There are 3 scales : 128², 256², 512². But, does that mean we rescale the sliding window so as to get the different scales? Or we rescale the input image of the convolutional neural network ?

thank you for the help !

jqJordan commented 6 years ago

Hey @Shiro-LK, I am new to the faster RCNN and i am also having the same confusion. Have you figured it out? I was confused by how a single 3X3 sliding window at each position on the feature map represents 9 different anchor boxes. It seems that, for each position on the final feature map, a 3X3 convolution operation is performed to generate a 512 vector, and this 512 vector is then mapped to a 18 vector representing objectness score for each anchor by 1X1 conv. How does this single 512 represent 9 different anchors?

Shiro-LK commented 6 years ago

@jqJordan From what I understood : k (9 anchors) are created for each images. The anchors are resized taking in account the downscale of our network. For example, if we use VGG16, the downscale is 16 so it will not be an anchor of 512 but 512/16. It is not our 3x3 sliding windows which is our anchors. The 512 vector is a feature map.