Weird results when evaluating GT bbox

Wuziyi616 commented 1 year ago

Hi, first I'd like to thank you for releasing this wonderful work. I'm working on event-based detection on the 1Mpx dataset. As a sanity check, I ran the evaluation code provided in the README, and input the GT bbox as both the prediction result and the GT labels. I expected to see a 100 mAP, but it actually gives me 0.

After digging into the code, I think this is because of the hyper-parameter time_tol here. Since the annotation frequency of 1Mpx is 30/60 Hz, which is higher than the default time_tol=50000us = 20Hz, when we do _match_times(), the predicted boxes will be matched to the GT bboxes that is 1 frame before it. Thus, this mismatch results in 0 mAP. I change the time_tol to 5000us = 200Hz, and now I can get 100 mAP.

So I'm curious, when you evaluate your model, did you e.g. down-sample the annotation frequency, i.e. evaluate on less timestamps? (I didn't find it in the paper) If not, didn't you encounter this issue in the past?

lbristiel-psee commented 1 year ago

Hello @Wuziyi616,

thanks for your feedback and happy you read you appreciate this dataset.

Indeed the time tolerance parameter (time_tol) should be set as a function of the detection and GT frequency. Basically it should be smaller than the lower of the 2 frequencies to have correct evaluation: time_tol <= min(1/gt_freq, 1/dt_freq).

In our case, the default value of 50000us (50ms) was OK because we run our model at 20Hz. But in your case you need to update the time_tol.

Hope this helps, Laurent

Wuziyi616 commented 1 year ago

Thanks for your reply!

prophesee-ai / prophesee-automotive-dataset-toolbox

Weird results when evaluating GT bbox #29