pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.34k stars 6.97k forks source link

Ground-truth bounding boxes included in RPN proposals? #5735

Open florisdf opened 2 years ago

florisdf commented 2 years ago

Hi,

I am training a Faster-RCNN-like architecture and wanted to log the detection losses on the validation dataset. To obtain these losses, I passed in the targets of the validation data to the forward() method of GeneralizedRCNN. When investigating the results, however, I noticed that each ground-truth bounding box had a near-exact match with a predicted bounding box. When passing targets=None, these bounding boxes were gone. Is this the intended behavior? If so, what is the rationale behind it?

I figured that this line is where the predicted and ground-truth boxes get mixed:

https://github.com/pytorch/vision/blob/095cabb76cdcd4763bad629481d189b91e3df42c/torchvision/models/detection/roi_heads.py#L646

It seems to have something to do with balancing positive/negative sampling. Yet, I don't see how a model would improve from boxes that it did not generate itself.

java-abhinav07 commented 2 years ago

Facing the same issue. Using this line during test time gives an mAP of ~100%, however including this during train time and not during test results in poor convergence (unable to get mAP even up to 10% ). Should we be returning the original proposals during train time as well?

java-abhinav07 commented 2 years ago

Facing the same issue. Using this line during test time gives an mAP of ~100%, however including this during train time and not during test results in poor convergence (unable to get mAP even up to 10% ). Should we be returning the original proposals during train time as well?

Having done some experiments, it turns out that the training performance is not impacted by the add_gt_proposals method and everything works fine if one doesn't use it during inference. Performance was largely impacted by the use of pretrained RPN, FPN.

crazyboy9103 commented 10 months ago

See https://github.com/facebookresearch/maskrcnn-benchmark/issues/570#issuecomment-473218934. Basically, it's a hack to train the classifier, without the box regressor obtaining the gradients.