nickgkan / butd_detr

Code for the ECCV22 paper "Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds"
Other
74 stars 11 forks source link

Eval on Referit3D metrics #32

Closed jiadingfang closed 1 year ago

jiadingfang commented 1 year ago

Hi authors,

First of all, thanks for the great work and the codebase, it is one of the few repos that can be directly run without a problem.

I noticed that your work is currently placed on top of the official Referit3d benchmark, and I would like to use your eval code to do some tests on the Referit3d dataset. However, when I run your code with sh scripts/train_test_cls.sh, the one with ground-truth bbox, I find that the evaluation metric is still IOU instead of accuracies required by Referit3d benchmark. If I understand correctly, Referit3d setting is given candidate 3d bboxes, and the accuracy is measured as picking the right one from list of bboxes by the referral text.

I wonder if there is a way to do that kind of eval within your codebase? and if not, what's the best way to adapt? Sorry if there are any misunderstandings and appreciate if you could point out.

nickgkan commented 1 year ago

Hi Jiading,

Thanks for your interest in our project! The metrics we report are actually accuracies under different IoU thresholds. In every case we regress a box. This is our prediction. Then, we compare it to the ground-truth box and we count it as positive if the IoU is above a threshold.

For Referit3D specifically, when given a pool of gt boxes, we compute the IoU of our predicted box and all given boxes. This can be interpreted as our way of scoring all given boxes (i.e., by how similar they are to our most confident prediction). Then, we keep the box with max IoU as our answer and check whether this coincides with Referit3D's ground-truth answer.

I hope this helps!