Closed Jayden9912 closed 1 year ago
@Jayden9912 for nuScenes detection evaluation, I don't think "if two predicted bboxes are associated with the same ground truth bboxes, both will be considered as true positives"
If you take a look at this line of code: https://github.com/nutonomy/nuscenes-devkit/blob/a27342283ec865f83424e936f2ee09494b591ec4/python-sdk/nuscenes/eval/detection/algo.py#L81
For a given sample, once a predicted box has been matched with a ground-truth (GT) box, the GT box is recorded in taken
and will no longer be used to find a match among other predicted boxes
My mistake. Thanks for the help
hi @whyekit-motional , I am trying to understand mAP calculation implemented. Given this scenario (2 predictions, 3 ground truth boxes) | prediction bounding box rank | conf score | match with GT? | TP | FP | Precision | Recall |
---|---|---|---|---|---|---|---|
bbox1 | 0.9 | 0 | 0 | 1 | 0 | 0 | |
bbox2 | 0.8 | 1 | 1 | 1 | 0.5 | 0.33 |
The graph shape should look like this,
According to the documentation from scikit learn, they would always make sure that when recall is 0, precision would be 1. "The last precision and recall values are 1. and 0. respectively and do not have a corresponding threshold. This ensures that the graph starts on the y axis."
I recognized that there are some filtering according to min_recall and min_precision here. But it is still quite different from the scipy version.
From my understanding, the implementation from nuscenes will penalize the performance of the model a lot when the detection has the highest confidence but doesn't match with any gt. Is my understanding correct? Or did I miss out anything?
Here is the code:
# writing scripts for mAP evaluation (nuscenes)
from typing import Callable
import numpy as np
MIN_PRECISION = 0.1
MIN_RECALL = 0.1
def center_distance(gt_box, pred_box):
return np.linalg.norm(np.array(pred_box[:2]) - np.array(gt_box[:2]))
def accumulate(gt_bboxes: np.ndarray, pred_bboxes: np.ndarray, dist_th: float, dist_func: Callable):
"""
accumulate per class
format of gt_bboxes and pred_bboxes
numpy arrays
gt_boxes = [(1, 2, 3, 4, 5, 6), (2, 3, 4, 5, 6, 7), (1, 2, 3, 4, 5, 6)] # gt boxes for class 0
# the bbox should be at the base???
pred_boxes = [(1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 0.9), (2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 0.8)] # pred boxes for class 0
"""
pred_confs = pred_bboxes[:, -1]
num_gt = len(gt_bboxes)
# get the sorting index
sortind = np.argsort(pred_confs)[::-1] # in descending order
tp = [] # accumulator for true positives
fp = [] # accumulator for false positives
conf = [] # accumulator for confidences
taken = set()
for ind in sortind:
pred_box = pred_bboxes[ind]
this_conf = pred_confs[ind]
min_dist = np.inf
match_gt_idx = None
for gt_idx, gt_box in enumerate(gt_bboxes):
# Find the closest match among ground truth boxes
this_distance = dist_func(gt_box, pred_box)
if this_distance < min_dist:
min_dist = this_distance
match_gt_idx = gt_idx
is_match = min_dist < dist_th
if is_match:
taken.add(match_gt_idx)
# Update tp, fp and confs
tp.append(1)
fp.append(0)
conf.append(this_conf)
else:
tp.append(0)
fp.append(1)
conf.append(this_conf)
# no match
if len(tp) == 0:
return 0
# Accumulate
tp = np.cumsum(tp).astype(float)
fp = np.cumsum(fp).astype(float)
conf = np.array(conf)
# Calculate presision and recall
prec = tp / (fp + tp)
rec = tp / float(num_gt)
print("prec:", prec)
print("tp:", tp)
print("rec:", rec)
# interpolation
rec_interp = np.linspace(0, 1, 101) # 101 steps, from 0% to 100% recall
prec = np.interp(rec_interp, rec, prec, right = 0)
prec = prec[round(100* MIN_RECALL) + 1:]
prec -= MIN_PRECISION
prec[prec < 0] = 0
ap = np.mean(prec)/ (1-MIN_PRECISION)
print("AP:", ap)
if __name__ == "__main__":
gt_boxes = np.array([(1, 2, 3, 4, 5, 6), (2, 3, 4, 5, 6, 7), (11, 21, 31, 41, 51, 61)]) #(x,y,z,l,w,h)
pred_boxes = np.array([(-3, -3, 3.5, 4.5, 5.5, 6.5, 0.9), (2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 0.8)]) #(x,y,z,l,w,h,yaw)
accumulate(gt_boxes, pred_boxes, 1, center_distance)
@Jayden9912 in response to your question:
the implementation from nuscenes will penalize the performance of the model a lot when the detection has the highest confidence but doesn't match with any gt. Is my understanding correct? Or did I miss out anything?
mAP itself - not just specifically the implementation of mAP in nuScenes - penalizes the performance of a model a lot when its highest-confidence detection doesn't match any GT (i.e. it is a FP)
And this is rightly so - a model which is extremely confident but produces a FP detection should logically be worse than a model which is slightly less confident but produces a TP detection
I see. Will study more.
Thanks again for the clarification!
Hi.
I noticed in coco evaluation, if two predicted bboxes are associated with the same ground truth bboxes, one would be considered as true positives and the other bbox will be considered as false positive if there is no match with other gt bboxes.
But in nuscenes evaluation, if two predicted bboxes are associated with the same ground truth bboxes, both will be considered as true positives.
Is there any reason it is calculated in such way for nuscenes evaluation?