megvii-research / PETR

[ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection & [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
Other
878 stars 132 forks source link

Normalized 3D coordinates or un-normalized coordinates? #121

Closed sithu31296 closed 1 year ago

sithu31296 commented 1 year ago

In the paper, it is mentioned that the coordinates of the 3D space are normalized to [0, 1]. But when I looked at the code, the coordinates are un-normalized after the regression head. Did I miss something?

Also, I want to know what is the accuracy difference between these two.

yingfei1016 commented 1 year ago

Hi, The coordinates are normalized for 3D PE,you can find at https://github.com/megvii-research/PETR/blob/main/projects/mmdet3d_plugin/models/dense_heads/petr_head.py#L316 . The 3D PE will added with image features, however the regression head is for object quries. So there is no relation with the regression head.

I think the normalize is not necessary. For example the direction vectors can also be used to generate 3D PE. You can find from https://github.com/TRI-ML/VEDet/blob/main/projects/mmdet3d_plugin/models/dense_heads/vedet_head.py#LL251C57-L251C57 . The VE-Det is a more elegant way and without performance sacrifice.

sithu31296 commented 1 year ago

I mean the object queries (in the regression head) not the coordinates in the 3D PE.

yingfei1016 commented 1 year ago

Reference points are initialized between 0 and 1 https://github.com/megvii-research/PETR/blob/main/projects/mmdet3d_plugin/models/dense_heads/petr_head.py#L276. After training, its value is almost within this range.

sithu31296 commented 1 year ago

Sorry for the confusion. After this line https://github.com/megvii-research/PETR/blob/main/projects/mmdet3d_plugin/models/dense_heads/petr_head.py#L441, the coordinates are un-normalized. What I wanted to ask is are these un-normalized coordinates are used in loss calculation or in some where they are normalized again before loss calculation?

yingfei1016 commented 1 year ago

For loss calculation, the coordinates are un-normalized.

sithu31296 commented 1 year ago

Is there any reason you use the un-normalized coordinates? Thank you so much for replying.

yingfei1016 commented 1 year ago

It widely used to calculate loss by using non-normalized coordinates, such as detr3d, bevformer. In my opinion, the depth estimation is most critical in BEV detection. Using non-normalized coordinates directly can make the learning of position better.

Besides, the loss scale of non-normalized coordinates is appropriate. For example, if the the distance between the prediction and GT is 1m, then the loss is 1. The initial loss is about 1-2, which I think is appropriate (near to loss_cls). For normalized coordinates, the loss weight need to be carefully adjusted.

sithu31296 commented 1 year ago

Thank you I will close the issue for now!