mitmul / chainer-faster-rcnn

Object Detection with Faster R-CNN in Chainer
MIT License
288 stars 87 forks source link

does the width=W/4? #10

Open tanger830 opened 7 years ago

tanger830 commented 7 years ago

It's really beautiful workflow diagram, in the Feature maps(N, 512, H/4, W/4), i am feel confused that after four pooling layer in VGG network, it's height(width) may be H/(2^4) = H/16?

manuelschmidt commented 7 years ago

I suspect that you are right. Thinking of the original Caffe implementation, I know that the shorter side of the image gets scaled to 600px and if I remember correctly, the correspondig size of the featuremap was 38px which is round(600/(2^4))=round(37.5)=38. Therefore all of the other occurences of H/4, W/4 should all be H/(2^4) and W/(2^4) which also corresponds well to the feature stride of 16.

So it might indeed be a small mistake in the diagram. But nevertheless - it's really awesome and makes it much easier to understand Faster RCNN in detail.

Thanks for this great work!