megvii-research / PETR

[ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection & [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
Other
879 stars 132 forks source link

model trained on one set of extrinsic be applied to another setup #102

Open mdfaheem786 opened 1 year ago

mdfaheem786 commented 1 year ago

Hi, Great work, lots of learning after started using it.

I want to understand the significance of extrinsic matrix (lidar2cam) on model training. As 3d Points (frustum) and image features are passed to MLP for generating position embedding (PE),

how would any change in extrinsic (ex. different vehicle) affect the model performance?
Can a model trained on one set of extrinsic be applied to another setup by passing appropriate frustum while inference?
If training is coupled with extrinsic setup, any thoughts on generalizing for different extrinsic setups?
yingfei1016 commented 1 year ago

Hi,

In the training process, we will randomly rotate extrinsic (https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py#L161) to enhance the generalization performance. Therefore, in theory, it can be used for different extrinsic. Different frustum will be generated when different extrinsic is input.

However, in practical application, the performance degradation is still relatively large. It is due to the different extrinsic, and other reasons: 1) domain gap, such as different data scenes and FOV of the camera. 2) 2D PE is adopted in the original PETR (https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py#L48). We use a multi-view 2D PE. It has little improvement on performance, and it will also affect the speed and generalization performance. We removed it in subsequent versions (StreamPETR).

mdfaheem786 commented 1 year ago

Hi @yingfei1016,

i tried to make it up with LYFT dataset, by changing in the config as follow below

Attached is my yml and sample output from petr only for detection. but looks like there are shift and scale in the detected boxes. Please kindly correct me where am I doing wrong

thanking you, __CAM_FRONT_LEFT__host-a101_cam5_1241893240033330006_pred

petrv2_BEVseg_lyf_py.txt