Note that the convention used by this code deviates from the COCO convention. Whereas this code blindly matches detections with ground truths based solely on IOU, COCO considers the detected classes. That is, in the COCO convention, a detection is considered to be a true positive if its class matches that of a ground truth with which it has an IOU above the IOU threshold (if multiple such detections exist, the highest-confidence match is considered to be the true positive), even if another detection exists that has a higher IOU to that ground truth. This makes sense, since a detection of the correct class with an IOU above the threshold by definition is correct. In the code in this repository, if another detection of an erroneous class happens to have a larger IOU to the ground truth, then this is considered a false positive and the actual true positive is discarded (considered as an erroneous detection) since each detection and ground truth can have only one match (rows must sum to the total number of detections per class and columns must sum to the total number of ground truths / labels per class). Coco also generally considers maximum 100 detections per class.
In order to fix this issue, you must consider the the two separate cases of identifying "true positive detections" and "false positive detections". Start by sorting the input by confidence (usually anyways done when processing model outputs) and, for each class, remove detections above index 99 (detections_of_class = sorted_detections_of_class[:100]). The "true positive detections" should then be generated first by matching detections with correct class and IOU > threshold (again, if multiple detections match the same ground truth, the one with the highest confidence is considered as the true positive), counting each ground truth and detection only once (i.e., a detection can only be matched to a single ground truth and a ground truth can have only one match). The "false positive detections" should subsequently "pick up the scraps" from the "true positive detections" by matching any unmatched detections and ground truths (i.e., not included in "true positive detections") that have an IOU > threshold but is of the wrong class. These two matrices ("true positive detections" and "false positive detections") can subsequently be concatenated to "all_matches".
You can validate that your code works by comparing the recall calculated from your confusion matrix with the recall calculated inside the CocoEvaluator from the coco_eval library (the below extracts the COCO recall for iou_threshold=0.5, all areas, and maxDet=100):
evaluator=CocoEvaluator(coco_gt=validation_set.coco, iou_types=["bbox"])
coco_recall=evaluator.coco_eval['bbox'].eval['recall'][0, i, 0, 2]) # i is the index of the relevant class
Note that the convention used by this code deviates from the COCO convention. Whereas this code blindly matches detections with ground truths based solely on IOU, COCO considers the detected classes. That is, in the COCO convention, a detection is considered to be a true positive if its class matches that of a ground truth with which it has an IOU above the IOU threshold (if multiple such detections exist, the highest-confidence match is considered to be the true positive), even if another detection exists that has a higher IOU to that ground truth. This makes sense, since a detection of the correct class with an IOU above the threshold by definition is correct. In the code in this repository, if another detection of an erroneous class happens to have a larger IOU to the ground truth, then this is considered a false positive and the actual true positive is discarded (considered as an erroneous detection) since each detection and ground truth can have only one match (rows must sum to the total number of detections per class and columns must sum to the total number of ground truths / labels per class). Coco also generally considers maximum 100 detections per class.
In order to fix this issue, you must consider the the two separate cases of identifying "true positive detections" and "false positive detections". Start by sorting the input by confidence (usually anyways done when processing model outputs) and, for each class, remove detections above index 99 (detections_of_class = sorted_detections_of_class[:100]). The "true positive detections" should then be generated first by matching detections with correct class and IOU > threshold (again, if multiple detections match the same ground truth, the one with the highest confidence is considered as the true positive), counting each ground truth and detection only once (i.e., a detection can only be matched to a single ground truth and a ground truth can have only one match). The "false positive detections" should subsequently "pick up the scraps" from the "true positive detections" by matching any unmatched detections and ground truths (i.e., not included in "true positive detections") that have an IOU > threshold but is of the wrong class. These two matrices ("true positive detections" and "false positive detections") can subsequently be concatenated to "all_matches".
You can validate that your code works by comparing the recall calculated from your confusion matrix with the recall calculated inside the CocoEvaluator from the coco_eval library (the below extracts the COCO recall for iou_threshold=0.5, all areas, and maxDet=100):