Open mdfaheem786 opened 1 year ago
Hi,
In the training process, we will randomly rotate extrinsic (https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py#L161) to enhance the generalization performance. Therefore, in theory, it can be used for different extrinsic. Different frustum will be generated when different extrinsic is input.
However, in practical application, the performance degradation is still relatively large. It is due to the different extrinsic, and other reasons: 1) domain gap, such as different data scenes and FOV of the camera. 2) 2D PE is adopted in the original PETR (https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py#L48). We use a multi-view 2D PE. It has little improvement on performance, and it will also affect the speed and generalization performance. We removed it in subsequent versions (StreamPETR).
Hi @yingfei1016,
i tried to make it up with LYFT dataset, by changing in the config as follow below
Attached is my yml and sample output from petr only for detection. but looks like there are shift and scale in the detected boxes. Please kindly correct me where am I doing wrong
thanking you,
Hi, Great work, lots of learning after started using it.
I want to understand the significance of extrinsic matrix (lidar2cam) on model training. As 3d Points (frustum) and image features are passed to MLP for generating position embedding (PE),