isl-org / TanksAndTemples

Toolbox for the TanksAndTemples benchmark website
Other
86 stars 23 forks source link

Confusion with alignment matrix #31

Closed laurelkeys closed 3 years ago

laurelkeys commented 3 years ago

Hi, while I was going through the code in python_toolbox/evaluation/ to better understand how the evaluation metrics are computed I got a little confused by the way alignment / transformation matrices are applied.

From what I understand, the adopted convention is that the matrices align the reconstructed pose to the ground-truth (as mentioned in https://github.com/intel-isl/TanksAndTemples/issues/12#issuecomment-391831523 and on section 3-1. of the tutorial), i.e., using Open3D's parameter names: "source = reconstructed / estimate" and "target = ground-truth".

Hence, in run_evaluation() the transformation matrix gt_trans should align the reconstruction to the ground-truth (right?).

However, in trajectory_alignment() the transformation is applied to the ground-truth trajectory: https://github.com/intel-isl/TanksAndTemples/blob/90cd206d6991acec775cf8a2788517d7ecc30c2f/python_toolbox/evaluation/registration.py#L65-L69

Does it make sense to apply a "reference to ground-truth" transform to data in the ground-truth coordinate frame? Shouldn't this use the inverse transform, effectively taking "ground-truth to reference" (i.e. traj_pcd_col.transform(np.linalg.inv(gt_trans)))? Or instead, apply the transformation to the reference data (traj_to_register_pcd in this case)?

Thank you.

arknapit commented 3 years ago

Hi Tiago,

The variable names are a bit ambiguous here, and I see how this can be confusing. The function trajectory_alignment() aligns the "source" camera trajectory with our known "target" trajectory and in addition puts it in real-world coordinates (LiDAR coordinates). This target trajectory (_gt_trajcol = e.g. Ignatius_COLMAP_SfM.log) still lives in an arbitrary COLMAP outputted reference frame, therefore we transform it to the LiDAR reference frame using "gt_trans" and do the ICP between the camera positions to get the final trajectory alignment afterwards (its a precursor for the final refinement with the dense point cloud later). The GT in the name _gt_trajcol here just means that it is the camera trajectory we want to get our _traj_toregister aligned to, so its basically the camera trajectory, where we know how it aligns to the GT reconstruction. So to summarize: for this automatic alignment procedure, all we usually have is the "source" camera poses in an arbitrary reference frame, so we need 3 additional things to calculate this "pre-alignment" step (see here)::

  1. map_file = for temporal alignment, in case you used different frames from the video than what we provide as image samples. (see tutorial A, case 2 for more details)
  2. gt_traj_col = the camera trajectory of the target reconstruction (its not in real world coordinates, its just another reconstruction)
  3. gt_trans = the transformation to get gt_traj_col to the actual real-world (Lidar) coordinates

let me know if there are still questions,

laurelkeys commented 3 years ago

Hi Arno,

Thank you for the detailed reply!

Does this mean that – besides gt_traj_col which is in COLMAP's (arbitrary) reference frame – all other data is either in the estimate reconstruction reference frame or in known real-world coordinates? That is:

And so, gt_trans = np.loadtxt(alignment) is a transformation matrix from COLMAP to "target" (i.e. real-world), while trajectory_transform is a transformation from "source" to "target" (and so are the three r*.transformation)?

arknapit commented 3 years ago

Exactly: trajectory_transform is the rough pre-alignment using the camera positions, and the registration refinment is done using the dense pointclouds.

laurelkeys commented 3 years ago

Awesome, thank you for the replies!