Open tanger830 opened 7 years ago
I suspect that you are right. Thinking of the original Caffe implementation, I know that the shorter side of the image gets scaled to 600px and if I remember correctly, the correspondig size of the featuremap was 38px which is round(600/(2^4))=round(37.5)=38. Therefore all of the other occurences of H/4, W/4 should all be H/(2^4) and W/(2^4) which also corresponds well to the feature stride of 16.
So it might indeed be a small mistake in the diagram. But nevertheless - it's really awesome and makes it much easier to understand Faster RCNN in detail.
Thanks for this great work!
It's really beautiful workflow diagram, in the Feature maps(N, 512, H/4, W/4), i am feel confused that after four pooling layer in VGG network, it's height(width) may be H/(2^4) = H/16?