spla-tam / SplaTAM

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)
https://spla-tam.github.io/
BSD 3-Clause "New" or "Revised" License
1.59k stars 174 forks source link

Confusion about 'transformed_params2depthplussilhouette' function #52

Closed RPFey closed 10 months ago

RPFey commented 10 months ago

Thank you for sharing the code. I am confused about the transformed_params2depthplussilhouette function here.

def transformed_params2depthplussilhouette(params, w2c, transformed_pts):
    rendervar = {
        'means3D': transformed_pts,
        'colors_precomp': get_depth_and_silhouette(transformed_pts, w2c),
        'rotations': F.normalize(params['unnorm_rotations']),
        'opacities': torch.sigmoid(params['logit_opacities']),
        'scales': torch.exp(torch.tile(params['log_scales'], (1, 3))),
        'means2D': torch.zeros_like(params['means3D'], requires_grad=True, device="cuda") + 0
    }
    return rendervar

Since in the transform_to_frame function, transformed_pts are the points transformed into current camera frame. Why do we need to transform it again in the code get_depth_and_silhouette(transformed_pts, w2c) ? Shouldn't we just use the transformed_pts to compute the depth ?

Buffyqsf commented 10 months ago

I think it's related to the render camera setting. The gaussian render Renderer(raster_settings = curr_data['cam'], so the render camera's view is always from the first frame's position (call it as view camera). Usually it's identity, which is easy to understand. But when it's not identity, the depth value from scene to view camera used be adjusted. I may not express it very well, but I think the key is there is a view camera.

Nik-V9 commented 10 months ago

Hi @RPFey, Thanks for your question! @Buffyqsf is right.

The input w2c for the transformed_params2depthplussilhouette function is the first_frame_w2c, the viewing camera. Ideally, if all the poses are with respect to the first frame (as is the case with SplaTAM), the input w2c to the transformed_params2depthplussilhouette function would be identity. However, to make sure the function is general and works with any viewing camera definition, we convert the world frame Gaussians to the viewing camera frame to compute the depth: https://github.com/spla-tam/SplaTAM/blob/a0bda58dd6fbf3e2ad31e40adc48514923bec4c0/utils/slam_helpers.py#L172

RPFey commented 10 months ago

Thank you for your replies. It answers my questions !