Open gnoynait opened 8 years ago
Are those ground truth, which have less than 0.5 IoU with all prior boxes, usually small?
Thanks, Wei,
@gnoynait Thanks for the explanation. How do you calculate the center of receptive field? It might be easy to derive for VGG, but how about Inception style network? I haven't checked the details on how RPN places the anchor boxes. I tiled the prior boxes in a very simple and naive way, which seems not the optimal way as you pointed out. Do you see any improvement if you fix the prior box placement? I am wondering if that is more of an issue for small prior boxes (e.g. those on conv4_3).
2) It seems reasonable. I guess your dataset has many small objects? One possible way to solve it is to place more dense small prior boxes. Again, after you fix the bipartite matching bug, do you see any improvements?
Thanks again for spending time explaining these in details. I really appreciate it!
@gnoynait
I looked at the code, and don't think your 2) point is correct. From here, these two loops are trying to find the best matching between all possible remaining <prior box, ground truth> pair. I don't think your second plot is correct. The code might not be the most efficient bipartite matching, but it should be correct. Correct me if I am wrong.
@gnoynait
I also spent some time looking in details on how Faster R-CNN implemented their anchor boxes. In specific, generate_anchors.py generates a set of anchors and anchor_target_layer.py puts those anchors at each cell on a feature map.
The only different I see is that RPN's anchor boxes are centered at the top left corner of each cell, and SSD's default boxes are centered at the center of each cell. I don't think I agree with your plot 1.
Hi, Wei,
For receptive field center calculation, you can refer to the tutorial. The formula does not support dilated convolution, but it should be easy to derive. For multi-branch network, such as inception, the receptive field center is guaranteed to be the same in different branches, and the receptive field size is the maximum size of every branch.
Hi Yong,
Thanks for the explanation.
1) I actually noticed this. But as the layer become coarser and coarser, the stride doesn't strictly follow 2x rule anymore, right? For example, conv7_2's feature map size is 5x5, conv8_2's feature map size is 3x3. I can use 3x3 kernel with stride of 2 and pad 1 to get conv8_2 from conv7_2, but can also use 3x3 kernel with stride of 1 and pad 0 to get the same size conv8_2. What should be the stride for later case? Would it be the same as conv7_2, that is 64? I probably need to handle it more carefully. Do you have any suggestion? It is not a problem for faster rcnn since the feature map it uses is still relatively very large. I think for large objects, the difference is relatively small, and SSD seems doing well on large objects.
Besides, do you think it is problematic to offset the center of default box to the center of a cell instead of the top left corner?
2) What are those ground truth boxes which have low (0.1) IoU? Are those small ground truth? Theoretically, the tiling of default boxes are better than the ones in PRN. Faster R-CNN has an advantage because the ROI pooling can help classify object better. But maybe with a better placement of default box w.r.t. the receptive field of a kernel, it can have same advantage of ROI pooling.
On the other hand, Faster R-CNN can only use high resolution feature map, otherwise, ROI pooling will have problem (many boxes will collapse in a single bin).
Hi Wei,
I have spent quite some time on your SSD code. And I have some questions about the implementation.
As for question 1, I think the centre of the prior bbox is not calculated correctly, which make the model translation-variant. It may cause some problem when the training image and testing image have different sizes.
As for question 2, I have run some experiments. I turns out that many ground truth bboxes have a greatest IOU with all prior bboxes under 0.5. Training the model with different random aspect ratio relieves the problem. Do you have any other method to solve the problem?
Thank you!