Closed tianweiy closed 3 years ago
@tianweiy Thanks for your attention on the 3DDetection part.
For the groundtruth encoding, we follow the encoding method in SECOND or mmdetection3d.
We just modify the input encoding of coordinate (x,y,z), via cylinder partition instead of cubic partition.
thanks!
sorry, one follow up question. Does this mean that you transform the xyz back to the original xyz view 3d coordinate before computing any losses?
Basically, after cylindrical convolution, you will get a Voxel feature map in cylindrical view, and then we will flatten it to bev + some 2d conv layers. All of these are still in the cylindrical coordinate right? After a detection head, you will get some output. What coordinate is this output? Is the target assignment in cylindrical or original coordinate?
I think cylindrical voxel feature map are transformed into BEV feature map, and do some 2D conv layers to predict 3DBox as SECOND does.
cylindrical voxel feature map are transformed into BEV feature map <- this is not trivial right? You will get some overlapping feature value?
@tianweiy Since I try two version of backbones, including SECOND(3D representation and backbone) and SSN (bird-view representation like pillar), we use two methods to get the features. For 3d, we just maintain the cylindrical features, and for 2d bird-view, we take the height as the whole one, such like pointpillar.
thanks. so just to be clear, in the end, the ground truth boxes are also transformed to cylindrical view and then you do 3d detection following the anchor-based methods in second / mmdet3d?
I am just wondering how you compute loss? Are the two boxes in cylindrical view or original view
@tianweiy For anchor encoding and loss, both of them are in cylindrical view.
I see. Thanks, then back to the first question. Is the velocity/rotation transformed to the cylindrical view?
Yes, these targets are also in the cylindrical view.
I see. Thanks a lot for your reply!
Yes, these targets are also in the cylindrical view. @xinge008 Hi, Xinge.
I still do not understand how it is done exactly. Can you show me the formulas to get the detection target?
In my opinion, the center is transformed to the cylindrical view by cart2polar
, and the heading angle is also rotated to the cylindrical view. But, how do you formulate the bounding box size in the cylindrical view?
Thanks.
For 3D detection, we get [x, y, z, w, l, h, vx, vy, theta]
x, y,z should be transformed into the cylindrical view. But what about the remained regression targets? Do you just regress to the original value or some transformation?