hkzhang95 / DynamicRCNN

Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training, ECCV 2020
https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600256.pdf
MIT License
172 stars 23 forks source link

Some doubts about the details #5

Closed ys0823 closed 4 years ago

ys0823 commented 4 years ago

Thanks for the work DynamicRCNN, I have read the paper "Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training", for dynamic Dynamic SmoothL1 Loss, when calculating the regression errors, the source code is

https://github.com/hkzhang95/DynamicRCNN/blob/de62c3a4c3c131da9679dbda689d37edbdaaa5a1/models/zhanghongkai/dynamic_rcnn/coco/dynamic_rcnn_r50_fpn_1x/network.py#L166

           raw_regression_targets = cat(
                [proposal.get_field("regression_targets") for proposal in
                 raw_proposals], dim=0
            ).abs()[:, :2].mean(dim=1)
            rcnn_error_new = torch.kthvalue(raw_regression_targets.cpu(), min(
                cfg.MODEL.DYNAMIC_RCNN.KE * cfg.SOLVER.IMS_PER_GPU,
                raw_regression_targets.size(0)))[0].item()

But I have some doubts: First, why does it use the mean offsets x and y instead of using the mean offsets of x ,y, w ,h to calculate the regression errors? What's more, in the paper DynamicRCNN, it mentioned use regression error to update beta, but when I read the source code, it just uses the ground truth of offsets, it seems unreasonable.

Second, the hyper-parameter MODEL.DYNAMIC_RCNN.KE choose the number 8 10 15, Are there some reasons? Does it relate to the anchor number or some other parameters? @hkzhang95

hkzhang95 commented 4 years ago

Thanks for your question. Sorry for the late reply since I was a little busy these days.

  1. Using the mean offsets of x&y is just one simple choice. From Figure 2 in the paper, the distribution of w&h are also similar, so we just use x&y for simplicity.
  2. The definition of regression error is the offset from the candidate box to the target ground truth box. So I have no idea where is the problem.
  3. KE can be viewed as a percentage ratio of the positives. For example, if KI=75, selecting KE=15 means we record the regression error of the positive sample at 20%. So choosing KE as 8 &10 means the percentage ratio is around 10%/13%.

I hope this can solve your problem.

ys0823 commented 4 years ago

Thanks for your question. Sorry for the late reply since I was a little busy these days.

  1. Using the mean offsets of x&y is just one simple choice. From Figure 2 in the paper, the distribution of w&h are also similar, so we just use x&y for simplicity.
  2. The definition of regression error is the offset from the candidate box to the target ground truth box. So I have no idea where is the problem.
  3. KE can be viewed as a percentage ratio of the positives. For example, if KI=75, selecting KE=15 means we record the regression error of the positive sample at 20%. So choosing KE as 8 &10 means the percentage ratio is around 10%/13%.

I hope this can solve your problem.

Thanks for your reply. But I am still puzzled about the definition of regression error. In the paper DynamicRCNN, in Fig.4 the paper analyzes the relationship between regression error and gradient, the abscissa is regression error which denotes the x in SmoothL1, but the implement of SmoothL1 in https://github.com/hkzhang95/DynamicRCNN/blob/de62c3a4c3c131da9679dbda689d37edbdaaa5a1/dynamic_rcnn/det_opr/loss.py#L15

For my understanding, we compute the abs of the distance of input and target . So for the abscissa regression error, it should be the error of predict offset and target offset instead of the offset from the candidate box to the target ground truth box. Otherwise, if we use the offset from the candidate box to the target ground truth box as regression error, it won't be changed actually, it always a fixed number because the offset from the candidate box to the target ground truth box don't change for the same image, it would be always related to the offset of targets, not related to the predicted offsets. The dynamic beta won't be really dynamic, it just the offset of targets with settled value before. So, in training process, it just changes the image combination of the different batch to get different beta.

hkzhang95 commented 4 years ago

Sorry for the wrong comment. Actually, I have noticed the misleading words and changed the expression from regression error to regression label in Figure 2. However, I forgot to change some other similar expressions in the paper, I will fix this bug in the next version.

Just for clarification (we use regression label to update beta):

Moreover, if we use the offset from the candidate box to the target ground truth box to update beta, it is also dynamic since in the second stage the candidate box is the proposal which is not fixed.