rafaelpadilla / review_object_detection_metrics

Object Detection Metrics. 14 object detection metrics: mean Average Precision (mAP), Average Recall (AR), Spatio-Temporal Tube Average Precision (STT-AP). This project supports different bounding box formats as in COCO, PASCAL, Imagenet, etc.
Other
1.08k stars 215 forks source link

COCO results are slightly different from your implementation #98

Closed andreaceruti closed 2 years ago

andreaceruti commented 2 years ago

When using the official COCO API i get these results on my dataset: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.430 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.706 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.455 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.096 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.428 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.513 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.052 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.387 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.490 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.101 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.477 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586

Instead when using this tool I get this Results for COCO AP: 0.43250168920233456 AP50: 0.7083519513734896 AP75: 0.45794547410493686 APsmall: 0.01306930693069307 APmedium: 0.3503581513380598 APlarge: 0.5008855413320579 AR1: 0.052450980392156864 AR10: 0.38823529411764707 AR100: 0.4913398692810458 ARsmall: 0.03333333333333333 ARmedium: 0.40162162162162157 ARlarge: 0.5640897755610973

So the first 3 AP and the 3 AR are almost the same, instead the others referred to small/medium/large have some significant difference in my opinion, this issue is opened just to do a note for you, but maybe I am missing something.

andreaceruti commented 2 years ago

@rafaelpadilla I share to you the images, the ground truth file and the detection file in a zip folder on my drive , so if you want you can try yourself. Feel free to close the issue when you want This is the simple script I use on Colab to run the evaluation on the official COCO API

from pycocotools.coco import COCO from pycocotools.cocoeval import COCOeval import json

images_path = ... gt_file_path = ... dt_file_path = ...

detections = json.load(open(dt_file_path)) detections_list = detections["annotations"]

coco_gt = COCO(gt_file_path) coco_dt = coco_gt.loadRes(detections_list) coco_eval = COCOeval(coco_gt, coco_dt, "bbox") coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize()

Jelly123456 commented 2 years ago

I faced a similar problem as you. I also used my own datasets for evaluation. The result differs hugely, while this repo gives mAP@0.5 of 0.4; the same datasets give mAP@0.5 of 0.7 when running validation with yolov5(https://github.com/ultralytics/yolov5).

Waiting for the author's replies.

rafaelpadilla commented 2 years ago

@Jelly123456 @andreaceruti

The metrics considering the areas are the ones that presented different results.

This can be explained by the fact that by default cocopytools does not consider the area of the bounding boxes by default, but the areas of the segmented object instead. See here that the default

Other important differences:

@andreaceruti , in your detection file, one of you detections is:

[{"segmentation": [], "area": 63738, "iscrowd": 0, "ignore": 0, "image_id": 0, "bbox": [499.2635498046875, 741.0051879882812, 184.719482421875, 446.47088623046875] See that the "area" is described as 63738, this is the segmented area, which might be used by cocopytools. Nevertheless, our tool uses the area 82471 (185 * 446 = 82510).

This was also explained here.

I hope that clears your doubts. :)

rafaelpadilla commented 2 years ago

Just debugged pycocotools considering the 4 differences I mentioned in my previous message and the results got closer, but still has some differences.

It requires a deeper investigation, which I dont have time to do now.

I suggest changing your gt and dt files to have only a few bounding boxes and debug both codes to find the exact point of divergence. If any of you could do that, I appreciate your help. If you find the solution, please open a PR and I will approve it.

andreaceruti commented 2 years ago

Thank you a lot for your clarifications! In fact I am doing instance segmentation on my dataset, and so I removed the segmentation part of the detection without considering to recalculate the area according to the bounding box coordinate. Maybe it is a problem of my annotation file, anyway yes I will open a PR if I will manage to debug the divergence.