Unable to reproduce Faster RCNN evaluation metrics on pascal voc 2010 for Object Detection

kevalmorabia97 commented 4 years ago

Hi Everyone,

I am training the pretrained Faster RCNN model on PASCAL VOC 2010 dataset for Object Detection by following this pyTorch finetuning tutorial: pytorch.org/tutorials/intermediate/torchvision_tutorial.html

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes=21)

I also tried changing backbone to mobilenet_v2 as described in the above tutorial but results were much worse.

I am using this dataset loading code with a batch size of 2: https://github.com/pytorch/vision/issues/1097#issuecomment-508917489. I am also using RandomHorizontalFlip transformation while training. I train the models using the code in the tutorial (github.com/pytorch/vision/blob/master/references/detection/engine.py).

The model performance on val dataset degrades after 5th epoch and the best mAP I could get is about 47% which is much less than the expected performance (69.9%). Please note that I train on train split and evaluate on val split whereas in the paper, the model is trained on trainval and tested on test split but I don't think this can lead to such a reduction of performance.

params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.0001, momentum=0.9, weight_decay=0.005)

# optimizer = torch.optim.Adam(params, lr=0.0001, weight_decay=0.005)
# Adam gives much worse results (< 10% mAP!) for some reason!

for epoch in range(30):
    train_one_epoch(model, optimizer, train_loader, device, epoch, print_freq=1000)
    evaluate(model, val_loader, device=device)

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.472
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.768
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.522
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.188
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.518
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.412
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.599
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.607
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.318
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.535
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.650

Can anyone please help me resolve the issue? Do I have to make any changes to the default parameters in torchvision's faster_rcnn implementation?

Specifications: Python - v3.7.3 pyTorch - v1.3.0 torchvision - v0.4.1 CUDA - v10.1 GPU - NVIDIA GeForce RTX 2080 8GB

Thanks for your time!

fmassa commented 4 years ago

Hi,

This can be due to a number of factors. You are training the model from scratch, which might require a lot of training iterations before it starts giving meaningful results, specially because Pascal is small.

I think the key issue here though is that there might be an issue with your dataset implementation, which is not returning things in the format the the evaluation code expects.

I would visualize the output of the model after the 10 epochs that you currently have and see if they start to look reasonable or not yet.

Without further information, it's difficult to say much more about it.

kevalmorabia97 commented 4 years ago

Hi @fmassa, I just updated my answer above. I was able to resolve the issue of all metrics to be 0 by changing from mobilenet_v2 backbone to the default pretrained backbone, and optimizer from Adam to SGD. I still am getting much worse results then expected (47% vs 70% mAP). Could you please help me out?

fmassa commented 4 years ago

I would say to increase the number of training epochs and the learning rate as a start. Also, increasing the training set size (by using trainval instead of train) will definitely improve performances.

WZMIAOMIAO commented 4 years ago

hi, my personal opinion is that your result is good. The mAP in faster R-CNN is 69.9 when iou = 0.5, not IoU=0.50:0.95. in your result, when iou = 0.5, mAP is 76.8.

kevalmorabia97 commented 4 years ago

Thanks @WZMIAOMIAO for pointing out my mistake! I was able to get even better results by reducing weight decay to 1e-6 and adding learning rate decay.

After training on VOC 2010 train for 10 epochs, I got mAP[@IoU=0.5] = 79.4 on VOC 2010 val. After training on VOC 2007 trainval for 10 epochs, I got mAP[@IoU=0.5] = 86.0 on VOC 2007 test.

pytorch / vision

Unable to reproduce Faster RCNN evaluation metrics on pascal voc 2010 for Object Detection #2095