Project 3D Center Bounding Box to Image

JohnPekl commented 4 months ago

I need to get the projection matrix from rotation vector, translation vector, and camera_intrinsic provided in nuscenes. The code to obtain the project matrix is shown below. I need to check whether the projection matrix is correct or not.

Hence, I tried to project a 3D center bounding box to an image (using a projection matrix) but the resulting point after projection is out of image range.

cam_front_data = nusc.get('sample_data', my_sample['data'][sensor])
rdata = nusc.get_sample_data(cam_front_data['token'])  # data_path, box_list, cam_intrinsic
calibrations = nusc.get('calibrated_sensor', cam_front_data['calibrated_sensor_token'])
translation = calibrations['translation']
rotation = calibrations['rotation']
rotation_matrix = Quaternion(rotation).rotation_matrix
camera_intrinsic = calibrations['camera_intrinsic']
extrinsic_cam = np.column_stack((rotation_matrix, translation))
cam_mat = np.dot(camera_intrinsic, extrinsic_cam)  # PROJECTION MATRIX
box_vis_level = BoxVisibility.ALL
data_path, boxes, camera_intrinsic = nusc.get_sample_data(cam_front_data['token'],
                                                          box_vis_level=box_vis_level)
img = Image.open(data_path)
for box in boxes:
    ax.imshow(img)
    c = np.array(nusc.colormap[box.name]) / 255.0
    box.render(ax, view=camera_intrinsic, normalize=True, colors=(c, c, c))

    c = box.center
    vet2 = np.array([[c[0]], [c[1]], [c[2]], [1]])
    temp_c = np.dot(cam_mat, vet2)
    point_3d2img = temp_c[0:2] / temp_c[2]

For example with scene-0103, CAM_BACK, /nuScenes/v1.0-mini\\samples/CAM_BACK/n008-2018-08-01-15-16-36-0400__CAM_BACK__1533151603537558.jpg, box.center is c=[-0.34674763 0.74665736 18.78830883], the output point_3d2img =[[-14908.95759742], [ 345.79102845]] while the image size is 900x1600.

What is wrong with my code?

whyekit-motional commented 4 months ago

@JohnPekl you can use the following part of the devkit as a reference: https://github.com/nutonomy/nuscenes-devkit/blob/9b165b1018a64623b65c17b64f3c9dd746040f36/python-sdk/nuscenes/nuscenes.py#L294-L300

JohnPekl commented 4 months ago

Hi @whyekit-motional , I have tried myself to understand your suggestion (first, move to ego vehicle coord system and second, move to sensor coord system) but I do not know how to obtain the projection matrix correctly (my purpose is to get 3x4 projection matrix)

Here is my code, the result point point_3d2img is not shown in the center of bounding box.

cs_translation = cs_record['translation']
cs_rotation = cs_record['rotation']
cs_rotation_matrix = Quaternion(cs_rotation).rotation_matrix
cs_camera_intrinsic = cs_record['camera_intrinsic']
cs_extrinsic_cam = np.column_stack((cs_rotation_matrix, cs_translation))

pose_translation = pose_record['translation']
pose_rotation = pose_record['rotation']
pose_rotation_matrix = Quaternion(pose_rotation).rotation_matrix
pose_extrinsic_cam = np.column_stack((pose_rotation_matrix, pose_translation))

for box in boxes:
    ax.imshow(img)
    c = np.array(nusc.colormap[box.name]) / 255.0
    box.render(ax, view=cs_camera_intrinsic, normalize=True, colors=(c, c, c))

    c = box.center
    vet2 = np.array([[c[0]], [c[1]], [c[2]], [1]])
    temp_c = np.dot(cs_cam_mat, np.append(np.dot(pose_extrinsic_cam, vet2), 1)[:, np.newaxis])
    point_3d2img = temp_c[0:2] / temp_c[2]

whyekit-motional commented 4 months ago

@JohnPekl I suppose by "3x4 projection matrix", you are referring to the camera intrinsic matrix - if so, note that the camera intrinsic matrix has nothing to do with the location and rotation of the camera (focal length, aperture, field-of-view, resolution, etc. are the parameters that determine the intrinsic matrix of a camera)

JohnPekl commented 4 months ago

@whyekit-motional 3x4 projection matrix is the combination of intrinsic camera parameters and extrinsic camera parameters. With this matrix, we only need one matrix multiplication to get 3D to 2D transformation.

it is calculated by multiplication them together cam_mat = np.dot(camera_intrinsic, extrinsic_cam) # PROJECTION MATRIX (shown below image)

This calculation can be done if Rotation is done before Translation. However, in your reference code, Translation is done before Rotation.

whyekit-motional commented 4 months ago

@JohnPekl in nuScenes, the inverse of the rotation is used (note the .inverse)

JohnPekl commented 4 months ago

I finally obtained the projection matrix prj_mat with the code below. Thank @whyekit-motional for your active support.

t1 = -np.array(pose_record['translation'])
pose_rotation = pose_record['rotation']
R1 = Quaternion(pose_rotation).inverse.rotation_matrix
# R (x + t) = R x + t'. You want this t' ? If so, it will be t' = R t .
t1 = np.dot(R1, t1)
E1 = np.column_stack((R1, t1))

t2 = -np.array(cs_record['translation'])
cs_rotation = cs_record['rotation']
R2 = Quaternion(cs_rotation).inverse.rotation_matrix
# R (x + t) = R x + t'. You want this t' ? If so, it will be t' = R t .
t2 = np.dot(R2, t2)
E2 = np.column_stack((R2, t2))
cs_camera_intrinsic = cs_record['camera_intrinsic']

# The combined transformation matrix (E) can be obtained by multiplying E2 with E1
E = np.dot(np.row_stack((E2, [0, 0, 0, 1])), np.row_stack((E1, [0, 0, 0, 1])))
prj_mat = np.dot(cs_camera_intrinsic, E[0:3, :])

How to use prj_mat?

boxes_3d = nusc.get_boxes(cam_front_data['token'])
for box_3d in boxes_3d:
    cc = np.copy(box_3d.center)
    vet2 = np.array([[cc[0]], [cc[1]], [cc[2]], [1]])
    temp_c = np.dot(prj_mat, vet2)
    point_3d2img = temp_c[0:2] / temp_c[2]

We transform 3D point cc into image point_3d2img by only using one matrix multiplication np.dot(prj_mat, vet2).

nutonomy / nuscenes-devkit

Project 3D Center Bounding Box to Image #1040