kaist-avelab / K-Radar

4D Radar Object Detection for Autonomous Driving in Various Weather Conditions
309 stars 44 forks source link

Evaluation code #28

Open nightrome opened 6 months ago

nightrome commented 6 months ago

Hi. Thank you for the nice dataset. I am trying to find out how your evaluation code works in detail.

To evaluate on the test set, we have N=17536 samples/frames and C=7 conditions. So my assumption would be that the total AP is either an average performance over samples or over conditions. However, when running your evaluation code on our results, neither seems to be the case.

image

Note how Total (your evaluation code) differs from both averaging over samples and averaging over conditions. Furthermore, our method outperforms the other only on 2 out of 7 conditions (and only marginally), but still scores higher. How is that possible if not due to a large imbalance in the frequency of the conditions? However, we already ruled this out by computing the average over samples.

Furthermore, I also validated this against some of the results in https://arxiv.org/pdf/2303.06342.pdf, e.g. the strongest row in table 2. The Total result is neither the average over samples nor conditions.

Any feedback would be welcome.

DongHeePaek commented 6 months ago

Hi, @nightrome

Thank you for taking an interest in our K-Radar dataset and incorporating it into your research. It's gratifying to see our work being utilized in meaningful ways, especially in projects as significant as yours.

I'd like to clarify that the differences in results you're observing are likely due to discrepancies in the training and evaluation settings.

Our evaluation code for the K-Radar dataset primarily adapts the official KITTI evaluation code by traveller59, with modifications to suit the evaluation of multimodal data, including K-Radar.

Key adjustments include:

  1. Coordinate System Adaptation: To accommodate the different coordinate systems between K-Radar and KITTI, we've implemented format conversion code. An issue related to this was raised but confirmed to be error-free, as detailed in Issue #23 on our GitHub.

  2. Region of Interest (RoI) Filtering: Our code filters RoIs frame-by-frame, selecting only those frames with detectable objects for evaluation. Given that 4D Radar's measurement range is limited to -53 to 53 degrees laterally (with the front being 0 degrees), objects outside this range are excluded from evaluation. This decision, reflected in our code (e.g., line 363 of kradar_detection_v2_1.py), ensures fairness by not penalizing the radar for not detecting objects beyond its physical measurement capabilities. Frames lacking detectable objects, therefore, are omitted from evaluation to maintain equitable comparison across multimodal data.

This approach acknowledges that physical constraints (like RoI) can affect the number of frames suitable for evaluation. Thus, averaging results across different conditions without accounting for these factors may not provide a fair assessment.

We encourage experimentation with the publicly released model under a unified specification for all experiments and will gladly assist with debugging if you share your training specifications with us.

Thank you.