Closed apatsekin closed 5 years ago
Sorry for the confusion, the image_anchor
is the number of anchors per image selected for training. We will add a fully annotated config soon.
Solved by #217
Thank you @RogerChern for quick response!
Please correct me if I am wrong:
class anchor_generate:
scale = (2, 4, 8, 16, 32) # corresponds to 32x32, 64x64, 128x128, 512x512 of original image
ratio = (0.5, 1.0, 2.0) # 16x32, 32x32, 32x64, etc...
stride = 16 # one "pixel" in feature map corresponds to 16x16 of original image
image_anchor = 256 # number of top confidence regions passed to classification head out of RPN
Also I read your doc with config comments. On the example of same TridentNet config
class subsample_proposal:
proposal_wo_gt = True #not actually sure what are the "proposals without ground truth" in this context?
image_roi = 128 # number of anchor boxes randomly sampled for training
fg_fraction = 0.5 #fraction of foreground boxes from those 128. Why is it said "the **maximum** fraction" given that FG boxes are scarce compared to BG and usually undersampled?
fg_thr = 0.5 # boxes with GT IoU in range [0.5,1.0] assigned to FG for loss function
bg_thr_hi = 0.5 # boxes with IoU in range [0.0; 0.5] assigned to background for loss function
bg_thr_lo = 0.0 # background boxes with IoU below this one dropped from loss?
The question is: original Faster-RCNN paper uses [0,0.3] for background, [0.3,0.7] dropped from loss and [0.7, 1.0] for foreground. In TridentNet config looks like you use strict [0,0.5] for FG and [0.5,1.0] for BG threshold. Is it correct? Thanks again for being responsive!
proposal subsampling is aimed for generating the target for the RCNN bbox head, not the RPN head.
In trident configuration file (here) Region Proposal Network params are defined as following:
What does actually
image_anchor
andscale
mean? My initial assumption was that it takes base anchor size: 256256 and produces 5 scales and 3 ratios according toscale
andratio
multipliers. However, scale 32 for 256^2 (32256 x 32*256 box) doesn't make any sense. Could someone explain in details what exactly those parameters mean? Source code just passes it to mxnet function, which looks like an interface without implementation. And documentation just says "Used to generate anchor windows by enumerating scales"