Question about the RPN scales

FunkyKoki commented 2 years ago

Thanks again for your work.

I wonder whether the scales of RPN would change the performance too much. The original code you provided use 5 scales at all, have you ever tried 3 scales?

FunkyKoki commented 2 years ago

By the way, the official implementation of Faster R-CNN use anchor_sizes = ((32, 64, 128, 256, 512, ), ) * 3 as anchor scales, why did you change it to ((16,), (32,), (64,), (128,), (256,), (512,))? And since you only send the first 4 scales to the second phrase of detection, anchor scales larger than 256 (included) are not used at all. Can you explain it?

FunkyKoki commented 2 years ago

Sorry for bothering, I ask these questions because my computing resources are very limited (only one GTX 1080 Ti), since FPN uses too much memory, I would like to optimize the FPN to be more efficient and cheaper to apply. Thanks again.

vitoralbiero commented 2 years ago

I wonder whether the scales of RPN would change the performance too much. The original code you provided use 5 scales at all, have you ever tried 3 scales?

We haven't tried using only 3 scales, but I guess that will make the performance worse.

By the way, the official implementation of Faster R-CNN use anchor_sizes = ((32, 64, 128, 256, 512, ), ) * 3 as anchor scales, why did you change it to ((16,), (32,), (64,), (128,), (256,), (512,))? And since you only send the first 4 scales to the second phrase of detection, anchor scales larger than 256 (included) are not used at all. Can you explain it?

We added the 16 scale so that we could better detect tiny faces. If I recall correctly, each feature map is using multi-scale anchors, so all anchors should be used here.

Sorry for bothering, I ask these questions because my computing resources are very limited (only one GTX 1080 Ti), since FPN uses too much memory, I would like to optimize the FPN to be more efficient and cheaper to apply. Thanks again.

No worries. If you want to speed up the FPN, just drop the smaller scale we added (16).

Hope this helps.

FunkyKoki commented 2 years ago

Quote: If I recall correctly, each feature map is using multi-scale anchors, so all anchors should be used here.

If I understand the code correctly, as code in rpn shows, here and here, the RPNHead predicts only different anchor offset with different aspect ratios for one scale feature maps. In the code here, you only send features 0, 1, 2, 3 to the second phrase, which means the fifth scale pool is just discarded. Can you explain more about it?

vitoralbiero commented 2 years ago

pool is indeed discarded, exactly like in the official Faster-RCNN. For the RPN, our only change was to add another anchor size, everything else is standard from Faster-RCNN. If you have questions regarding design choices for Faster-RCNN, I recommend asking on this repo, as one of Faster-RCNN authors might reply.

FunkyKoki commented 2 years ago

Alright, thank you. I think that your work is a good entry point for learning object detection. Thanks anyway.

FunkyKoki commented 2 years ago

https://github.com/vitoralbiero/img2pose/issues/53#issuecomment-1001660033

I don't think so. Actually, you applied FPN in your code. So anchors with different scales are applied to different feature maps with different sizes.

The original implementation of Faster R-CNN did not use FPN.

vitoralbiero commented 2 years ago

I did not mean the original Faster R-CNN, but rather the official Faster R-CNN with FPN implementation in torchvision.

FunkyKoki commented 2 years ago

🏎️

vitoralbiero / img2pose

Question about the RPN scales #53