haksorus / gsplatloc

GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization
Other
38 stars 1 forks source link

matrix c2w in loc_inference.py #1

Open kongbia opened 3 days ago

kongbia commented 3 days ago

In loc_inference.py, $R,t$ generated by PnP represents the world-to-camera transformation in my understanding. Why assign it to the $c2w$ matrix, which seems to represent the camera-to-world transform in compute_warping_loss

c2w = torch.eye(4, 4, device='cuda') 
c2w[:3, :3] = torch.from_numpy(R).float()
c2w[:3, 3] = torch.from_numpy(t[:, 0]).float()
haksorus commented 1 day ago

Sorry for the late reply.

It's actually the world-to-camera matrix. Thus, predicted/optimized poses as well as GT poses from 3DGS cameras are presented in w2c format. In this case, the matrix was given the wrong name, I will fix it soon.

Thanks for noticing!

kongbia commented 1 day ago

Sorry for the late reply.

It's actually the world-to-camera matrix. Thus, predicted/optimized poses as well as GT poses from 3DGS cameras are presented in w2c format. In this case, the matrix was given the wrong name, I will fix it soon.

Thanks for noticing!

Thanks for your reply, I still have a minor question. In compute_warping_loss function

def compute_warping_loss(vr, qr, quat_opt, t_opt, pose, K, depth):
    warp = pose @ from_cam_tensor_to_w2c(torch.cat([quat_opt, t_opt], dim=0)).inverse()
    warped_image = differentiable_warp(vr.unsqueeze(0), depth.unsqueeze(0), warp.unsqueeze(0), K.unsqueeze(0))
    loss = F.mse_loss(warped_image, qr.unsqueeze(0))

    return loss

warp first implements the c2w back-projection via initial pose pose, and then implements w2c projection via optimized pose quat_opt, t_opt. Howver, the pose in the code represents w2c, while the from_cam_tensor_to_w2c(torch.cat([quat_opt, t_opt], dim=0)).inverse() represents c2w, which seems not to be consistent with the principle.

haksorus commented 23 hours ago

Sorry for the late reply. It's actually the world-to-camera matrix. Thus, predicted/optimized poses as well as GT poses from 3DGS cameras are presented in w2c format. In this case, the matrix was given the wrong name, I will fix it soon. Thanks for noticing!

Thanks for your reply, I still have a minor question. In compute_warping_loss function

def compute_warping_loss(vr, qr, quat_opt, t_opt, pose, K, depth):
    warp = pose @ from_cam_tensor_to_w2c(torch.cat([quat_opt, t_opt], dim=0)).inverse()
    warped_image = differentiable_warp(vr.unsqueeze(0), depth.unsqueeze(0), warp.unsqueeze(0), K.unsqueeze(0))
    loss = F.mse_loss(warped_image, qr.unsqueeze(0))

    return loss

warp first implements the c2w back-projection via initial pose pose, and then implements w2c projection via optimized pose quat_opt, t_opt. Howver, the pose in the code represents w2c, while the from_cam_tensor_to_w2c(torch.cat([quat_opt, t_opt], dim=0)).inverse() represents c2w, which seems not to be consistent with the principle.

Here the warp is defined as warp = w2c @ c2w So in the differentiable_warp function the part

    cam_points = warp @ world_points # (B, 4, H*W)

is equal to

    world_points = c2w @ world_points
    cam_points = w2c @ world_points  # (B, 4, H*W)

I hope this helps.