Closed CherryXChen closed 9 months ago
Hi, thanks for your question.
cam_pose
, when loaded from disk, here, represents the transform from the current location of the depth sensor, to the origin of the coordinates.
For clarity in this answer, let's define it as origin_T_depth
, so that when read from right to left it reads as depth -> origin
. Using the same notation, we can rename d_to_rgb
(i.e.depth -> rgb
) as rgb_T_depth
.
ACE is RGB-only, so we actually need the rgb -> origin
transform, therefore we use the formula you quoted:
cam_pose = np.matmul(cam_pose, np.linalg.inv(d_to_rgb))
becomes, replacing the variable names with the notation I defined above:
origin_T_rgb = origin_T_depth @ (rgb_T_depth)^-1 = origin_T_depth @ depth_T_rgb
The second formula you quoted:
eye_coords = np.matmul(d_to_rgb, eye_coords)
is similarly used to transform the depth maps captured by the Kinect, which are defined in a coordinate system having its origin on the center of the depth sensor, into depth maps having the origin centered on the RGB sensor.
Note that, since ACE is RGB-only, this part of the setup script is actually not necessary for our training. We just use the RGB images, camera poses, and camera intrinsics.
Hopefully this answers your questions, but I'm happy to clarify further if necessary.
Thank you for your explanation!
Hi, your work is great. I notice that you obtain camera poses of 7scenes in setup_7scenes.py via
Why not
And I found there is a line
Can you help me to understand it? Thank you!