xmyqsh / FPN

Feature Pyramid Network
155 stars 58 forks source link

FPN ROI Choosing #5

Open Max-Fu opened 7 years ago

Max-Fu commented 7 years ago

Hi there! As I scan through Feature Pyramid Network for Object Detection, I found a part where there is a formula for choosing the feature map for ROI based on the size of the region proposal. Can you show me how you implement this? I wish to implement FPN on the new Object Detection API provided by Tensorflow.

Max-Fu commented 7 years ago

k = k0 + log2(√(wh)/224) where k0 equals 4, k equals the output-layer's layer number, w and h are the width and height of the regional proposal.

xmyqsh commented 7 years ago

in file proposal_target_layer.py L101

`def calc_level(width, height): return min(5, max(2, int(4 + np.log2(np.sqrt(width * height) / 224))))

level = lambda roi : calc_level(roi[3] - roi[1], roi[4] - roi[2])   # roi: [0, x0, y0, x1, y1]

leveled_rois = [[], [], [], []]
leveled_rois[0] = [roi for roi in rois if level(roi) == 2]
leveled_rois[1] = [roi for roi in rois if level(roi) == 3]
leveled_rois[2] = [roi for roi in rois if level(roi) == 4]
leveled_rois[3] = [roi for roi in rois if level(roi) == 5]`

this logic can be implemented either in proposal_target_layer or in roi_pooling_layer implemented in proposal_target_layer need call four roi_pooling_layer but may benefit from CPU and GPU parallel implemented in roi_pooling_layer need just one roi_pooling_layer and better benefit of acceleration of GPU

Do you think the latter one is a better chioce?

Max-Fu commented 7 years ago

Thank you for answering this question! I just finished my implementation of FPN over the new Tensorflow Object Detection API. I implemented this algorithm in roi pooling layer.

Max-Fu commented 7 years ago

Choosing ROI is definitely the better choice. Then I was just confused about where to add this formula.

xmyqsh commented 7 years ago

Hey man, How is your training result? The rpn_loss of my training result is many times larger than the FastRCNN loss. Do you think I should also add k = k0 + log2(√(wh)/224) into archor target layer? Has the paper mentioned this? I think it should be a reasonable improvement. (Set w and h to be the the width and height of ground truth bbox in this layer.)

Max-Fu commented 7 years ago

The training result was not as good as the one mentioned in the paper (I only trained for 2 days). RPN loss was also many times larger than the Faster RCNN loss. (I don't know much about fast rcnn though). You can definitely try your method, see if it is correct.

Zehaos commented 7 years ago

Hi, @xmyqsh @Max-Fu The author claim that they use 4 step training rather than end2end training. (Please refer to 5.2.2 Shareing Features). I implemented FPN using MXNET, and I have tried alternated training. The RPN result is good (8 points higher than the res50-c4 baseline), but the fast-rcnn result is quite bad.

xmyqsh commented 7 years ago

@Zehaos Good! I will try it. But how to evaluate RPN result, AP or AR? Do you know where has the definition of AR in Table 1, average recall?

Zehaos commented 7 years ago

@xmyqsh I used average recall to do the evaluation (on VOC dataset). Table 1 is the eval result of COCO tools? I'm not sure.

Johere commented 7 years ago

@xmyqsh Hi, you mentioned that the logic of choosing the feature map for ROI can be implemented either in proposal_target_layer or in roi_pooling_layer. And I implemented this algorithm in roi pooling layer but I got a bad result. However, I find that proposal_target_layer is not used in the stage 'TEST', while roi_pooling_layer is both used in the stage 'TRAIN' and 'TEST'. So the implementation of these two situations should be different? Is there anything wrong with my understanding?

xmyqsh commented 7 years ago

@Johere You are right. Among the three layers of RPN, only proposal layer used in 'TEST' phase. Anchor target layer is used to generate the delta of anchors for RPN training. And the proposal target layer is used to generate the delta of proposal region as well as proposal region (ROI) for Fast-RCNN training. Well, the proposal layer is used to generate the proposal region (ROI). Roi pooling is used to crop the ROI from the feature map, then pool them into unified 7x7 features.

I implemented the logic of choosing the feature map for ROI (k = k0 + log2(√(wh)/224)), I think it should be better to say P2\~P5 aware/wising ROI, in proposal layer. My proposal layer in 'TEST' phase output P2\~P5 aware/wising ROI, which is different from its output in 'TRAIN' phase.

Johere commented 7 years ago

@xmyqsh
Thank you very much! May I ask you about your training results? I modified the roi_pooling_layer by choosing the feature map (P2/P3/P4/P5) before roi pooling operate, and the rest code of this layer remains the same, but the result was bad... How about your implementation?

xmyqsh commented 7 years ago

@Johere

I implement the feature map(P2/P3/P4/P5) choosing operate in proposal layer in 'TEST' phase, and in proposal target layer in 'TRAIN' phase.

If you implement this in roi_pooling_layer, be aware of recoding the mapping of feature map (P2/P3/P4/P5) and ROIS, the mapping relationship should be used in backward process again.

If your RPN performance is not as good as the paper says, and you just use one image, not two as the paper says, in one forward/backward process, I think you should use lower learning rate than the paper says, cause there are not enough efficient rois in rgs_loss, so its gradient may be not stable, and a lower learning rate should be a better choice.

I'm just optimizing and testing my RPN performance. Previous end-to-end training result is bad.

Johere commented 7 years ago

@xmyqsh OK. Thank you for answering me!

xmyqsh commented 7 years ago

@Zehaos P6 should be included in RPN's head, but I encountered numerical problem(nan) during training when I added it. Have you encountered similar problem?

Zehaos commented 7 years ago

@xmyqsh No. I use max pooling to downsample P5 and allow border anchor during training, the training is smooth.

xmyqsh commented 7 years ago

@Zehaos Same to you. What's your max pooling's kernel size, 3x3 or 1x1? And your learning rate is 0.02 as the paper says?

Zehaos commented 7 years ago

@xmyqsh Kernel size=2 ... stride=2. I used lr=0.002 due to a smaller batch size(1img/gpu * 4 gpu).

xmyqsh commented 7 years ago

@Zehaos After use Kernel size=2, NAN disappeared... Thank you!

Zehaos commented 7 years ago

@xmyqsh You are welcome.

xmyqsh commented 7 years ago

@Zehaos How many image_batch_size do you use in fast-rcnn of alternated training? Larger image_batch_size should help training?

Zehaos commented 7 years ago

@xmyqsh I used image_batch_size = 2, roi_batch_size = 256. Larger image_batch_size should help because of less ROI correlation.

Feynman27 commented 6 years ago

@xmyqsh Your current implementation for choosing the pyramid level assigns all rois to each feature map. For example, you are sampling 128 rois and assigning each of them to the 4 pyramid levels (P2~P5), resulting in 512 rois per image. Is this deliberate? Shouldn't each roi be assigned to a unique level in the feature pyramid -- given by the formula in the paper?

For example, compare leveled_idxs in the two implementations below (they are not the same)

  1. RoI indexes are assigned to each level
    leveled_idxs = [[]] * 4
    for idx, roi in enumerate(rois):
        level_idx = level(roi) - 2
        leveled_idxs[level_idx].append(idx)
  2. RoI indexes assigned to different levels, determined by k = k0 + log2(√(wh)/224):
    leveled_idxs = [[], [], [], []]
    for idx, roi in enumerate(rois):
        level_idx = level(roi) - 2
        leveled_idxs[level_idx].append(idx)
stillwalker1234 commented 6 years ago

@Feynman27

That's a really subtle error, have you tried training with that mod?

xmyqsh commented 6 years ago

@Feynman27 Good!

Feynman27 commented 6 years ago

Yes, but surprisingly, it didn't really change the mAP much. It actually dropped it about 0.5-1.0 percentage point.

xmyqsh commented 6 years ago

@Feynman27 Have you changed the related codes in proposal_layer.py and proposal_target_layer.py simultaneously?

Feynman27 commented 6 years ago

@xmyqsh Yes.

On Wed, Sep 20, 2017, 9:23 AM xmyqsh notifications@github.com wrote:

@Feynman27 https://github.com/feynman27 Have you changed the related codes in proposal_layer.py and proposal_target_layer.py simultaneously?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xmyqsh/FPN/issues/5#issuecomment-330849954, or mute the thread https://github.com/notifications/unsubscribe-auth/AL2y8H8d5dOfKKhZhu35bWuxMaZMN6v8ks5skRHvgaJpZM4OZBm7 .

hhchyer commented 6 years ago

@Feynman27 The formula to choose level of proposal layer should be a balance of speed and accuracy. The proposals can benefit from other layers even.