Closed FunkyKoki closed 2 years ago
By the way, the official implementation of Faster R-CNN use anchor_sizes = ((32, 64, 128, 256, 512, ), ) * 3
as anchor scales, why did you change it to ((16,), (32,), (64,), (128,), (256,), (512,))
? And since you only send the first 4 scales to the second phrase of detection, anchor scales larger than 256 (included) are not used at all. Can you explain it?
Sorry for bothering, I ask these questions because my computing resources are very limited (only one GTX 1080 Ti), since FPN uses too much memory, I would like to optimize the FPN to be more efficient and cheaper to apply. Thanks again.
I wonder whether the scales of RPN would change the performance too much. The original code you provided use 5 scales at all, have you ever tried 3 scales?
We haven't tried using only 3 scales, but I guess that will make the performance worse.
By the way, the official implementation of Faster R-CNN use anchor_sizes = ((32, 64, 128, 256, 512, ), ) * 3 as anchor scales, why did you change it to ((16,), (32,), (64,), (128,), (256,), (512,))? And since you only send the first 4 scales to the second phrase of detection, anchor scales larger than 256 (included) are not used at all. Can you explain it?
We added the 16 scale so that we could better detect tiny faces. If I recall correctly, each feature map is using multi-scale anchors, so all anchors should be used here.
Sorry for bothering, I ask these questions because my computing resources are very limited (only one GTX 1080 Ti), since FPN uses too much memory, I would like to optimize the FPN to be more efficient and cheaper to apply. Thanks again.
No worries. If you want to speed up the FPN, just drop the smaller scale we added (16).
Hope this helps.
Quote: If I recall correctly, each feature map is using multi-scale anchors, so all anchors should be used here.
If I understand the code correctly, as code in rpn
shows, here and here, the RPNHead
predicts only different anchor offset with different aspect ratios for one scale feature maps. In the code here, you only send features 0
, 1
, 2
, 3
to the second phrase, which means the fifth scale pool
is just discarded.
Can you explain more about it?
pool
is indeed discarded, exactly like in the official Faster-RCNN.
For the RPN, our only change was to add another anchor size, everything else is standard from Faster-RCNN.
If you have questions regarding design choices for Faster-RCNN, I recommend asking on this repo, as one of Faster-RCNN authors might reply.
Alright, thank you. I think that your work is a good entry point for learning object detection. Thanks anyway.
https://github.com/vitoralbiero/img2pose/issues/53#issuecomment-1001660033
I don't think so. Actually, you applied FPN in your code. So anchors with different scales are applied to different feature maps with different sizes.
The original implementation of Faster R-CNN did not use FPN.
I did not mean the original Faster R-CNN, but rather the official Faster R-CNN with FPN implementation in torchvision.
🏎️
Thanks again for your work.
I wonder whether the scales of RPN would change the performance too much. The original code you provided use 5 scales at all, have you ever tried 3 scales?