tjiiv-cprg / EPro-PnP

[CVPR 2022 Oral, Best Student Paper] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
https://www.youtube.com/watch?v=TonBodQ6EUU
Apache License 2.0
1.11k stars 106 forks source link

Question in coordinate transform in data converter #23

Open qinyq opened 2 years ago

qinyq commented 2 years ago

Hi there, I tried the pertained model and it works well. I am currently reading the code and got confused by the following code:

https://github.com/tjiiv-cprg/EPro-PnP/blob/2bd75c410b7c75fe7fb51b9efaa0746f85b6b095/EPro-PnP-Det/tools/data_converter/nuscenes_converter.py#L472

where it says

    # get image projection
    cam_points = (lidar_points - cam_info['sensor2lidar_translation']
                  ) @ cam_info['sensor2lidar_rotation']

Why do you use cam_info['sensor2lidar_rotation'] here? It seems that cam_info['sensor2lidar_rotation'] means rotation from sensor to lidar and so if you would like to transform from lidar, you may need to use inverse like np.linalg.inv(cam_info['sensor2lidar_rotation']). why not use the inverse? Thank you.

Lakonik commented 2 years ago

Since the coordinates in lidar_points are arranged as row vectors instead of the usual column vectors used in equations, we can accordingly put the transposed rotation matrix to the right of the multiplication. Therefore the rotation matrix needs to be inverse and transposed, returning to its original form.

qinyq commented 2 years ago

Since the coordinates in lidar_points are arranged as row vectors instead of the usual column vectors used in equations, we can accordingly put the transposed rotation matrix to the right of the multiplication. Therefore the rotation matrix needs to be inverse and transposed, returning to its original form.

Thank you for your response. My question is why the code I quoted above does NOT use inverse rotation matrix... @ cam_info['sensor2lidar_rotation']

qinyq commented 2 years ago

Another question: In nuscenes_converter.py we need to extract the lidar points within each 3d bounding box to construct the 2D-3D-2DW correspondence. So what are the definition of the axes in the detected object (i.e. for each detected object, which direction means its x, y, z axis)? or you use axes definition same to each camera? Thank you in advance.

Lakonik commented 2 years ago

The inverse of an SO3 matrix equals its transpose, thus the transpose of inverse is itself. This converter uses the object coordinate system defined in the nuScenes dataset. But the dataloader will convert it to the KITTI format directions (x forward, y downward, z leftward): https://github.com/tjiiv-cprg/EPro-PnP/blob/2bd75c410b7c75fe7fb51b9efaa0746f85b6b095/EPro-PnP-Det/epropnp_det/datasets/nuscenes3d_dataset.py#L186

qinyq commented 2 years ago

The inverse of an SO3 matrix equals its transpose, thus the transpose of inverse is itself. This converter uses the object coordinate system defined in the nuScenes dataset. But the dataloader will convert it to the KITTI format directions (x forward, y downward, z leftward):

https://github.com/tjiiv-cprg/EPro-PnP/blob/2bd75c410b7c75fe7fb51b9efaa0746f85b6b095/EPro-PnP-Det/epropnp_det/datasets/nuscenes3d_dataset.py#L186

Got. Thank you!