Closed QZH-00 closed 5 months ago
Hi,
The initialize
function is only called to initialise the slam pipeline, not for every frames ( called when a reset is triggered).
So only the first frame’s camera pose is initialised at the ground truth (to make it easier to visualise again gt poses) Also, you can just initialise anywhere really (E.g. at identity)
All the frames are initialised around the last frame’s pose, as you can see in tracking function. https://github.com/muskie82/MonoGS/blob/main/utils/slam_frontend.py#L130
init_from_dataset
only sets the estimated camera pose to identity, gt pose is also stores but it’s for convenience only (for evaluation etc)
Hope this clarifies your question!
Thanks for your Thank you very much for your reply! I have one more question. We consider the ground truth to be the best pose estimation, so is the ground truth also the best choice for optimizing the parameters and rendering images for 3DGS ? To put it another way, suppose I set the pose to the ground truth (or close to it) for every frame, will I get better rendering results? After my experiment, I found that when the ATE RMSE is extremely small, the psnr becomes lower instead, why is that?
Given your greater experience and knowledge, I'd appreciate a reply!
It depends on many factors, but real world ground truth may not be the best for reconstruction since they are not aligned pixel perfectly.
Optimising the camera poses adds additional slack
to the system, where some real world imperfections can be explained by moving the camera poses slightly.
If you try using gt poses for replica, which is a synthetic dataset, I expect all metrics to be better. Otherwise you can run SfM like COLMAP, just like original 3D Gaussian splatting.
Hope this makes sense!
Thank you for your reply ! ! When I did more experiments on the EuRoC dataset, I found that there is a big difference in performance between monocular and stereo. The point cloud in monocular mode is very messy, while the point cloud in stereo mode is very regular (similar to the point cloud of classical methods like DSO) So, does this mean that using the direct method VO to get the pose and the point cloud as an initialisation for 3DGS is a direction worth trying? For classical monocular VO, Will the lack of scale and accurate depth have a bad effect on the initialisation? Wish your reply! @muskie82 @rmurai0610 Thanks ! !
Hi,
Yes, you can bootstarp out system with external tracker/VO's pose or initial points. The focus of our work is purely 3DGS-based SLAM system to address the 3DGS's intrinsic property for camera localisation task, but in practice you can use external pose/depth priors for an imeediate performance boost.
Since the initial question of this issue is solved, I will close this issue.
Thanks for this great work ! ! ! When I read the code , I'm trying to figure out how the camera pose is initialized and optimized . The only part of the code I found that was related to initialization was this one, and it used the ground truth value of the dataset for initialization.
viewpoint = Camera.init_from_dataset( self.dataset, cur_frame_idx, projection_matrix )
andviewpoint.update_RT(viewpoint.R_gt, viewpoint.T_gt)
So I'm confused as to why the ground truth value is used to initialize, but the camera gradient is still calculated and optimized afterwards? Isn't the initialized truth value the system optimal value? This may be a stupid question, but I would like to get an explanation from you .Thanks ! !