Questions about the pose optimization (paper and code)

muskie82 / MonoGS

[CVPR'24 Highlight] Gaussian Splatting SLAM

https://rmurai.co.uk/projects/GaussianSplattingSLAM/

Other

1.1k stars 95 forks source link

Questions about the pose optimization (paper and code) #93

Open leblond14u opened 1 month ago

leblond14u commented 1 month ago

Dear authors, comunity,

Is it possible for you clarify my understanding on how the pose optimization works ?

In my understanding the pose is optimized via gradient descent to minimize the jacobian values. Where the jacobian is derived analytically from the reprojection error (equations 3 to 6 in section 3.2).

However I'm a bit confused by the section 3.3.1 that presents a L1 loss in order to minimize the reprojection error. I can't seem to figure out how can the jacobian be used with the L1 loss. Can you bring some precision to this part ?
To my understanding this optimization process is done via the rasterizer submodule but I do not see any outputs of the estimated transform. Is it directly modifying the viewpoint_camera.cam_rot_delta and viewpoint_camera.cam_trans_delta parameters during the rasterization ?
I'm trying to reuse your rasterizer with the original 3DGS paper (with GT path) but I can't see the delta parameters updating, could you indicate to me how to get a pose estimation out of the rasterizer ?

Thanks in advance, Best regards,

Hugo

identxxy commented 1 month ago

I think the viewpoint_camera.cam_rot_delta and viewpoint_camera.cam_trans_delta are optimizable params updated here with an Adam optimizer.

https://github.com/muskie82/MonoGS/blob/6c9254c319d8bff5caeef65259e6bb0941a9b9f6/utils/slam_frontend.py#L128

I've tried using the rasterizer with the original 3DGS paper. My experience is that you cannot optimize the camera pose together with the map... The result map will get more and more blury because the camera is shaking. Although the camera pose only have small refinement, but the original pose already leaves great impact on the map, which is hard to elimate or amend.

I think the design of frontend and backend in MonoGS is necessary and smart that it avoids this problem. When tracking camera, the map is fixed. When mapping, you should trust the pose.

This is only my personal experience, maybe there are some other ways to optimize camera poses and the map simultaneously.

leblond14u commented 1 month ago

Hi thanks for your answer,

Have you succeeded in having a pose estimation though ?

For now printing the viewpoint_cam deltas doesn't show any update of those through the training. So to my latest understanding, I should be able to update the pose estimate by running the tracking() function inside of the base 3DGS training with the MonoGS rasterizer, right ?

Accordingly to the paper 3.2 section, the gradient is thus provided by the rasterizer jacobian computation and descended by the Adam optimizer in the tracking() function. And technically the Jacobian is derived from the section 3.3.1 L1 losses which capture the errors between the gaussian reprojections and the captured photometry and depth.

If my above statements are right, the only thing left to understand for me is how the jacobian gradient is comunicated between the rasterizer and the optimizer. Could you explain me this link ?

Many thanks, Best,

identxxy commented 1 month ago

I used some prior pose and tried optimizing the map and refining poses simultaneously, which turns out to be a really bad idea. T^T

I should be able to update the pose estimate by running the tracking() function inside of the base 3DGS training with the MonoGS rasterizer

yes I think so.

And technically the Jacobian is derived from the section 3.3.1 L1 losses which capture the errors between the gaussian reprojections and the captured photometry and depth.

Yes, it is in the paper, but in the code I think it's based on silhouette, see my question here https://github.com/muskie82/MonoGS/issues/90#issue-2287296083.

how the jacobian gradient is comunicated between the rasterizer and the optimizer

Well, I am also not clear about this part... But my undertanding and intuition is that if the rasterizer has position gradient for all GS in one camera, the opposite direction of the mean of all GS gradient projected to the 2D camera plane is the camera gradient. And the optimizer just use this gradient and some learning rate to optimize.

WFram commented 1 month ago

how the jacobian gradient is comunicated between the rasterizer and the optimizer

I think it's defined by the order in which the input tensors are specified when calling forward.

Implementation of backward outputs the gradients, rewriting the values in grad variables of tensors, in the same order as were passed when running forward.

That's why viewpoint_camera.cam_rot_delta and viewpoint_camera.cam_trans_delta are passed in forward, but they are not used in there (see the definition of _RasterizeGaussians in submodules/diff-gaussian-rasterization-w-pose/diff_gaussian_rasterization/__init__.py

When calling .step() in Adam optimizer, the values of these tensors get updated according to the gradients stored in grad for these tensors.

leblond14u commented 1 month ago

@identxxy I have a question concerning your try at getting a pose estimation. Have you managed to get "rather good" estimations of the pose ? For now when I try to track the pose of my camera with my adapted tracking() function in my classic 3DGS environment I am getting weird convergence results. I opened a new issue #98 on this.

identxxy commented 1 month ago

I was trying to use the tracking() to get refined poses since I already have "rather good" poses. It's just some details cannot align very well... It turns out that optimizaing both poses and the map makes both worse, which is actually intuitive that everything just mess up. So I think that's why MonoGS takes the design to seperate tracking() and mapping().