spla-tam / SplaTAM

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)
https://spla-tam.github.io/
BSD 3-Clause "New" or "Revised" License
1.58k stars 174 forks source link

Why is the projmatrix in cam the same to first frame? #118

Open Yoona12 opened 5 months ago

Yoona12 commented 5 months ago

Thanks for your great work! When i read the code, i have following question. With camera moving, the viewmatrix of camera is changing,and thus the projmatrix is changing. But in the code, with camera moving, the cam is the same to the first frame all the time.

Nik-V9 commented 5 months ago

Hi, Thanks for the interest in our work!

In SplaTAM, we fix the viewing camera to the first frame (this is the world frame). For subsequent cameras, rather than defining a new viewing camera, we project the Gaussians to the first frame (that is the world frame). Hence, we don't need to change the viewing camera for the rasterizer.

Please refer to the following comments and let me know if this answers your query:

  1. https://github.com/spla-tam/SplaTAM/issues/52#issuecomment-1885255397
  2. https://github.com/spla-tam/SplaTAM/issues/28#issuecomment-1855166250
DeepDuke commented 5 months ago

Hi, I have the same question. Before rendering, the Gaussians are transformed to the current frame. If the viewing matrix is fixed to the first frame, shouldn't we render the transformed Gaussians in the current frame? It looks strange to render at the first frame view. Could u please help explain more, thanks a lot! @Nik-V9

DeepDuke commented 5 months ago

Another question is about calculating the loss. If the rendered images are viewed from the first frame, how do u compute the loss with ground truth images that are observed at different frames?

Nik-V9 commented 2 months ago

Hi, as I mentioned before, the reference frame is fixed to the first frame, and the Gaussians (pointclouds) are just transformed into the frame of the reference frame for rasterization.

"For subsequent cameras, rather than defining a new viewing camera, we project the Gaussians to the first frame (the world frame). Hence, we don't need to change the viewing camera for the rasterizer."

Hence, the renderings for the current view will represent the current camera, and you can apply rendering losses.

Rashfu commented 2 months ago

@Nik-V9 Thank you for your explanation of coordinate transformations in the related issues. However, I'm not sure if my understanding is correct, so I tried to summarize my understanding as follows. Please point out if there are any mistakes.

In the get_pointcloud function, the point cloud for the first frame undergoes c2w @ pts4.T, which means that params['means3D'] always stores the positions in the world coordinate system, rather than treating the first frame directly as the world coordinate system. If the rendered viewpoint is always fixed at the first frame, then the 3D Gaussians in the world coordinate system need to be transformed by the pose of the current frame relative to the first frame for each rendering (similar to relative motion). This is also the meaning of params['cam_unnorm_rots'] and params['cam_trans'] defined here. Therefore, in the transform_to_frame function, rel_w2c here essentially transforms 3DGS to the camera coordinate system of the first frame to get transformed_pts, rather than the camera coordinate system of the current frame.

But based on my understanding, why does transformed_params2depthplussilhouetteadditionally require curr_data['w2c'], while transformed_params2rendervar does not? Aren't the transformed_pts already in the first frame's coordinate system?

Thanks in advance!