Closed DXZDXZ closed 3 years ago
As for convolution/pooling layer, the coordinate mapping can be expressed as *pos_layer1 = stride pos_Layer2 + ((kernel - 1) / 2 - padding)**. Due to the existence of padding, the mapping between the feature and the original picture has offset. Combined network structure of modified googlenet, the offset of the feature map to the original image is 13.
I understand. THANK YOU!
I still can not understand the reason why cfg.BACKBONE.OFFSET equals to 13. Could you tell me how to calculate it?