Open Yoona12 opened 5 months ago
Hi, Thanks for the interest in our work!
In SplaTAM, we fix the viewing camera to the first frame (this is the world frame). For subsequent cameras, rather than defining a new viewing camera, we project the Gaussians to the first frame (that is the world frame). Hence, we don't need to change the viewing camera for the rasterizer.
Please refer to the following comments and let me know if this answers your query:
Hi, I have the same question. Before rendering, the Gaussians are transformed to the current frame. If the viewing matrix is fixed to the first frame, shouldn't we render the transformed Gaussians in the current frame? It looks strange to render at the first frame view. Could u please help explain more, thanks a lot! @Nik-V9
Another question is about calculating the loss. If the rendered images are viewed from the first frame, how do u compute the loss with ground truth images that are observed at different frames?
Hi, as I mentioned before, the reference frame is fixed to the first frame, and the Gaussians (pointclouds) are just transformed into the frame of the reference frame for rasterization.
"For subsequent cameras, rather than defining a new viewing camera, we project the Gaussians to the first frame (the world frame). Hence, we don't need to change the viewing camera for the rasterizer."
Hence, the renderings for the current view will represent the current camera, and you can apply rendering losses.
@Nik-V9 Thank you for your explanation of coordinate transformations in the related issues. However, I'm not sure if my understanding is correct, so I tried to summarize my understanding as follows. Please point out if there are any mistakes.
In the get_pointcloud
function, the point cloud for the first frame undergoes c2w @ pts4.T
, which means that params['means3D']
always stores the positions in the world coordinate system, rather than treating the first frame directly as the world coordinate system. If the rendered viewpoint is always fixed at the first frame, then the 3D Gaussians in the world coordinate system need to be transformed by the pose of the current frame relative to the first frame for each rendering (similar to relative motion). This is also the meaning of params['cam_unnorm_rots']
and params['cam_trans']
defined here. Therefore, in the transform_to_frame
function, rel_w2c
here essentially transforms 3DGS to the camera coordinate system of the first frame to get transformed_pts,
rather than the camera coordinate system of the current frame.
But based on my understanding, why does transformed_params2depthplussilhouette
additionally require curr_data['w2c']
, while transformed_params2rendervar
does not? Aren't the transformed_pts
already in the first frame's coordinate system?
Thanks in advance!
Thanks for your great work! When i read the code, i have following question. With camera moving, the viewmatrix of camera is changing,and thus the projmatrix is changing. But in the code, with camera moving, the cam is the same to the first frame all the time.