naver / dust3r

DUSt3R: Geometric 3D Vision Made Easy
https://dust3r.europe.naverlabs.com/
Other
5.28k stars 576 forks source link

How does dust3r use known camera poses in global alignment? #143

Open resurgo97 opened 3 months ago

resurgo97 commented 3 months ago

Hello,

I'm testing dust3r with the Lego dataset using known camera poses to determine if this improves the (world-coordinate) pointmap generation during the global alignment optimization process. I've successfully loaded and set the camera poses using preset_pose (as mentioned in #30), but I'm unsure if this actually improves or even affects the global alignment optimization process (equation (5)).

In preset_pose, when known poses are provided, it updates self.im_poses:

self._no_grad(self._set_pose(self.im_poses, idx, torch.tensor(pose), force=True))

However, during compute_global_alignment, self.pw_poses, which are randomly initialized, are used and updated:

def forward(self):
    pw_poses = self.get_pw_poses()  # cam-to-world
    pw_adapt = self.get_adaptors().unsqueeze(1)
    proj_pts3d = self.get_pts3d(raw=True)

Also, It appears that self.im_poses is used to position the cameras in the GUI. (Not sure though)

I'm confused about why the code has separate attributes called self.pw_poses instead of using self.im_poses, given that pairwise poses seem to be a subset of self.im_poses from my understanding.

Could you please clarify:

  1. Does setting known poses through preset_pose actually affect or improve the global alignment process?
  2. Why are there separate self.pw_poses and self.im_poses attributes, and how do they interact during optimization?

Thank you

yocabon commented 3 months ago
pw_poses = self.get_pw_poses()  # pose per edge with scaling -> think of it as pose of the first image in the pair, but with an added scaling factor
pw_adapt = self.get_adaptors().unsqueeze(1)  # small correction to compensate for bad estimated intrinsics

aligned_pred_i = geotrf(pw_poses, pw_adapt * self._stacked_pred_i) # predictions for the first image in the pair, rescaled, in world space
aligned_pred_j = geotrf(pw_poses, pw_adapt * self._stacked_pred_j) # predictions for the second image in the pair, rescaled, in world space (pred_j is in the same space as pred_i)

proj_pts3d = self.get_pts3d(raw=True)
# proj_pts3d[self._ei] = geotrf(im_poses[self._ei], rel_ptmaps)  # optimized pointmaps, converted to world space using im_poses

li = self.dist(proj_pts3d[self._ei], aligned_pred_i, weight=self._weight_i).sum() / self.total_area_i  # distance between the predictions and optimized depthmaps that are now in the same space
lj = self.dist(proj_pts3d[self._ej], aligned_pred_j, weight=self._weight_j).sum() / self.total_area_j

return li + lj

if self.im_poses is frozen, then pw_poses will have to match their values because of proj_pts3d.