Unclear Evaluation Annotation Format

ffent commented 9 months ago

Issue
@DongHeePaek could you please provide a clarification on the correct annotation format for the evaluation.

TL:DR
Which format is used for the .txt files loaded in validate_kitti

Details
Currently, there is a lot of confusion regarding the correct annotation format since multiple different formats are used throughout the code and the data. However, it would be import to know which format is required to run the official evaluation.

According to KRadarDetection_v1_0 the labels are given as:

_, _, obj_id, class_name, x, y, z, theta, l, w, h
The labels for model training are given as cls_name, cls_idx, (xc, yc, zc, rz, xl, yl, zl), _
The model predictions are given as score, xc, yc, zc, xl, yl, zl, rot
To save the model results for both the prediction and the labels are converted within dict_datum_to_kitti to cls_val_keyword 0.00 0 0 50 50 150 150 zl yl xl yc zc xc rz
After saving, the predictions and labels are loaded again and the order is changed in get_label_annos to cls_val_keyword 0.00 0 0 50 50 150 150 xl zl yl yc zc xc rz

However, within calculate_iou_partly the default z_axis=1 and both the location as well as the dimensions for the BEV IoU calculation are selected as

loc = np.concatenate([a["location"][:, bev_axes] for a in gt_annos_part], 0)
dims = np.concatenate([a["dimensions"][:, bev_axes] for a in gt_annos_part], 0)

, with

bev_axes = list(range(3))
bev_axes.pop(z_axis)

which would mean that xl yl yc xc rz

is fed to the IoU calculation, which seems counterintuitive , but no further comments are provided on this.

Summary
Therefore, it would be great to know which format is expected to be saved to the .txt files in step 4. and if the subsequent IoU calculation is correct with the order described above. Moreover, it would be great if you could provide a simple example of a raw ground truth annotation and the corresponding ground truth .txt file.

DongHeePaek commented 9 months ago

Dear @ffent,

I sincerly appreciate for bringing this issue to our attention.

Firstly, we take this matter very seriously and are currently conducting a detailed investigation into the problem.

Secondly, on a positive note, our initial research (K-Radar & Enhanced K-Radar) only involved Sedan vehicles and utilized a metric with an IoU of 0.3. Therefore, we anticipate that incorporating solution will not significantly alter the performance of our existing model. However, it's important to note that for an IoU of 0.5 or 0.7, or when considering Bus or Truck class, there could be considerable performance changes.

We will thoroughly verify these aspects and update the related contents on our Github page and arxiv accordingly.

DongHeePaek commented 9 months ago

To summarize, there are no issues with the existing evaluation. evaluation_issue Our concern was that when evaluating between ground-truth and prediction (on the far left of the diagram), the parameters xl and yl might be switched (i.e., the angle of prediction or ground-truth could be rotated by 90 degrees), resulting in significantly lower performance than actual. However, in reality, the xl and yl parameters of both ground-truth and prediction change simultaneously (as shown on the far right of the diagram), so there is no impact on actual performance. For instance, as you can see in ./tools/eval_checker/check_evaluation_with_label.py, when loading all labels and creating predictions and ground-truth in K-Radar format for evaluation, we achieve 99.98~100% mAP for 0.98, 0.95, 0.3 IoU (verified for Car, Bus, or Truck classes). We have also validated the evaluation by converting to OpenPCDet format (refer to line 76 of the code).

For more details, please refer to ./tools/eval_checker/check_evaluation_with_label.py. Thank you for sharing this issue.

kaist-avelab / K-Radar

Unclear Evaluation Annotation Format #23