kujason / avod

Code for 3D object detection for autonomous driving
MIT License
937 stars 347 forks source link

question regarding 2D IoU in BEV #21

Closed johleh closed 6 years ago

johleh commented 6 years ago

Hello,

Thank you for releasing the code to your paper!

"Background anchors are determined by calculating the 2D IoU in BEV between the anchors and the ground truth bounding boxes. For the car class, anchors with IoU less than 0.3 are considered background anchors, while ones with IoU greater than 0.5"

I am trying to figure out how you overcame the problem of IoU calculation for non-axis aligned rectangles to determine negative and positive anchor predictions. The calculation uses 2 box_list objects. Could you please point me towards the box_list generation for the ground truth labels. Or help me to understand the process with a few words about the content of these box_lists. Is it an IoU-calculation between axis aligned bounding boxes around the ground truth box and anchor prediction boxes ?

Regards, Johannes

melfm commented 6 years ago

Hello,

The short answer is yes the IoU calculation is between non-oriented anchors and ground-truth boxes. This is because RPN is regressing non-oriented proposals, and hence during the second stage, we re-calculate the IoUs of the regressed anchors.

johleh commented 6 years ago

Thank you for the quick reply. I hope i understand how it works now :

RPN background/objects : no rotation, learns position and dimension

2nd stage:

right ? is that a standard approach, the obvious approach or something you came up with out of necessity to deal with IoU and rotations?

melfm commented 6 years ago

This approach simplifies the proposal generation where orientations is learned at a later stage. The aim is to use the RPN (which is also a smaller network compared to the second stage) to reduce search space and then further regress the relevant proposals and also learn the orientation.

johleh commented 6 years ago

I am sorry, i did not write clearly on what my question focused. I didn't mean to ask about the 2 stage approach, i just described the model to check if i understand correctly.

Most papers concerning the KITTI 3D Benchmark mention IoU thresholds as hard negative mining criterion, but do not explicitly state how to compute it. I think nobody uses exact IoU values for thousands of anchors, because it is too costly to calculate the exact IoU of non-axis aligned rectangles with something like Polygons and clipping algorithm,

How you came up with the solution to rotate all boxes to either 0°,180° or 90°,270° is what i was meant to ask, is it just the way it is done for someone with experience in computer vision ? You use this alignment step i think and like all simplifications it is a tradeoff. Which should work well for most ground truth labels in KITTI, but in general could be not optimal for objects (eg augmented labels) that occur with diagonal orientation (45°,135°..) right ?

asharakeh commented 6 years ago

@johleh to answer your question, this method is in fact optimal for our architecture. You are rotating the box around the centroid, so if there is an error in the centroid estimation, it is still captured by the orthogonal IOU calculation. Similarly, the size error is also captured by this simplification. Since proposals are not oriented, we expect a large orientation error anyways and thus it does not make sense to use oriented IOU calculation as the number of positive proposals using that method will be almost zero at 0.5 IOU threshold.

Hope this clarifies why our choice is valid for our architecture.

johleh commented 6 years ago

Thank you very much for the explanation.