About 7_scenes rgb calibration

CherryXChen commented 9 months ago

Hi, your work is great. I notice that you obtain camera poses of 7scenes in setup_7scenes.py via

cam_pose = np.matmul(cam_pose, np.linalg.inv(d_to_rgb))

Why not

cam_pose = np.matmul(d_to_rgb, cam_pose)

And I found there is a line

transform from depth sensor to RGB sensor eye_coords = np.matmul(d_to_rgb, eye_coords)

Can you help me to understand it? Thank you!

tcavallari commented 9 months ago

Hi, thanks for your question.

cam_pose, when loaded from disk, here, represents the transform from the current location of the depth sensor, to the origin of the coordinates. For clarity in this answer, let's define it as origin_T_depth, so that when read from right to left it reads as depth -> origin. Using the same notation, we can rename d_to_rgb (i.e.depth -> rgb) as rgb_T_depth.

ACE is RGB-only, so we actually need the rgb -> origin transform, therefore we use the formula you quoted:

cam_pose = np.matmul(cam_pose, np.linalg.inv(d_to_rgb))

becomes, replacing the variable names with the notation I defined above:

origin_T_rgb = origin_T_depth @ (rgb_T_depth)^-1 = origin_T_depth @ depth_T_rgb

The second formula you quoted:

eye_coords = np.matmul(d_to_rgb, eye_coords)

is similarly used to transform the depth maps captured by the Kinect, which are defined in a coordinate system having its origin on the center of the depth sensor, into depth maps having the origin centered on the RGB sensor.

Note that, since ACE is RGB-only, this part of the setup script is actually not necessary for our training. We just use the RGB images, camera poses, and camera intrinsics.

Hopefully this answers your questions, but I'm happy to clarify further if necessary.

CherryXChen commented 9 months ago

Thank you for your explanation!

nianticlabs / ace

About 7_scenes rgb calibration #30