How to build 3D Detection target in cylindrical view?

xinge008 / Cylinder3D

Rank 1st in the leaderboard of SemanticKITTI semantic segmentation (both single-scan and multi-scan) (Nov. 2020) (CVPR2021 Oral)

Apache License 2.0

856 stars 180 forks source link

How to build 3D Detection target in cylindrical view? #10

Closed tianweiy closed 3 years ago

tianweiy commented 3 years ago

For 3D detection, we get [x, y, z, w, l, h, vx, vy, theta]

x, y,z should be transformed into the cylindrical view. But what about the remained regression targets? Do you just regress to the original value or some transformation?

xinge008 commented 3 years ago

@tianweiy Thanks for your attention on the 3DDetection part.

For the groundtruth encoding, we follow the encoding method in SECOND or mmdetection3d.

We just modify the input encoding of coordinate (x,y,z), via cylinder partition instead of cubic partition.

tianweiy commented 3 years ago

thanks!

tianweiy commented 3 years ago

sorry, one follow up question. Does this mean that you transform the xyz back to the original xyz view 3d coordinate before computing any losses?

tianweiy commented 3 years ago

Basically, after cylindrical convolution, you will get a Voxel feature map in cylindrical view, and then we will flatten it to bev + some 2d conv layers. All of these are still in the cylindrical coordinate right? After a detection head, you will get some output. What coordinate is this output? Is the target assignment in cylindrical or original coordinate?

gujiaqivadin commented 3 years ago

I think cylindrical voxel feature map are transformed into BEV feature map, and do some 2D conv layers to predict 3DBox as SECOND does.

tianweiy commented 3 years ago

cylindrical voxel feature map are transformed into BEV feature map <- this is not trivial right? You will get some overlapping feature value?

xinge008 commented 3 years ago

@tianweiy Since I try two version of backbones, including SECOND(3D representation and backbone) and SSN (bird-view representation like pillar), we use two methods to get the features. For 3d, we just maintain the cylindrical features, and for 2d bird-view, we take the height as the whole one, such like pointpillar.

tianweiy commented 3 years ago

thanks. so just to be clear, in the end, the ground truth boxes are also transformed to cylindrical view and then you do 3d detection following the anchor-based methods in second / mmdet3d?

tianweiy commented 3 years ago

I am just wondering how you compute loss? Are the two boxes in cylindrical view or original view

xinge008 commented 3 years ago

@tianweiy For anchor encoding and loss, both of them are in cylindrical view.

tianweiy commented 3 years ago

I see. Thanks, then back to the first question. Is the velocity/rotation transformed to the cylindrical view?

xinge008 commented 3 years ago

Yes, these targets are also in the cylindrical view.

tianweiy commented 3 years ago

I see. Thanks a lot for your reply!

XiwuChen commented 3 years ago

Yes, these targets are also in the cylindrical view. @xinge008 Hi, Xinge.

I still do not understand how it is done exactly. Can you show me the formulas to get the detection target? In my opinion, the center is transformed to the cylindrical view by cart2polar, and the heading angle is also rotated to the cylindrical view. But, how do you formulate the bounding box size in the cylindrical view?

Thanks.