spla-tam / SplaTAM

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)
https://spla-tam.github.io/
BSD 3-Clause "New" or "Revised" License
1.58k stars 174 forks source link

Query regarding constant camera intrinsics #108

Closed whwh747 closed 6 months ago

whwh747 commented 7 months ago

Thank you for your excellent work. I would like to ask why the 'cam' in each frame of 'current_data' is the same as the 'cam' in the first frame.

curr_data = {'cam': cam, 'im': color, 'depth': depth, 'id': iter_time_idx, 'intrinsics': intrinsics, 
                     'w2c': first_frame_w2c, 'iter_gt_w2c_list': curr_gt_w2c}
AlexMorgand commented 6 months ago

I think cam is mostly used for the intrinsic parameters as it's assumed they are the same throughout the sequence (I suppose you're in the iphone_splatam file?)

You'll get the current pose from iter_time_idx and curr_gt_w2c.

cam is mostly passed to the Renderer im, radius, _, = Renderer(raster_settings=curr_data['cam'])(**rendervar)

See: https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/59f5f77e3ddbac3ed9db93ec2cfe99ed6c5d121d/diff_gaussian_rasterization/__init__.py

whwh747 commented 6 months ago

thank you for your reply,but camera center is always the first frame camera center。What could be the reason for this? @AlexMorgand

AlexMorgand commented 6 months ago

If we assume that all frames comes from the same camera, the camera center is not supposed to change.

Not sure if I get your question :(

whwh747 commented 6 months ago

I'm very sorry, as a beginner, my understanding is that as the camera moves, wouldn't the center of the camera also change?

AlexMorgand commented 6 months ago

Got it, no worries let me try to guide you a bit through it. Maybe you're familiar with the pinhole camera model but just to be sure here is a quick explanation.

TL; DR: Parameters like camera center and focal length are hardware specific (like the lens) so we can assume in most cases that they don't move because they are intrinsic to the camera. The camera moving is an "external" change (extrinsic). In the image of a video, every pictures are captured by the same camera so we can assume that the internal parameters of the camera don't change (Note: it's not always the case as the calibration can be affected by external factors and are often estimated numerically).

Have a look at this reference for a nice explanation: https://ksimek.github.io/2013/08/13/intrinsic/

Basically for a camera you have intrinsics and extrinsics parameters. The intrinsics here are specification of the camera i.e the calibration parameters (focal length, camera center/principal point, distortion).

The extrinsics (https://ksimek.github.io/2012/08/22/extrinsic/) are describing the pose of the camera i.e rotation + translation (and sometimes scale)

More links: https://ksimek.github.io/2012/08/14/decompose/

You can see it like this. When you're moving your smarphone the state of the phone is changing (position and rotation) but the camera remains the same (lens are the same). So the camera center relative to the camera is not changing.

It's a starting point but hopefully it can guide you a bit through your journey.

whwh747 commented 6 months ago

thank you,i got it. The camera center is an intrinsic parameter, so under ideal conditions, the center of the same camera wouldn't change. I suspect my confusion may stem from the use of multiple cameras in 3DGS, whereas there is only one camera in Splatam. Please allow me to express my gratitude once again, your explanation is very clear!

AlexMorgand commented 6 months ago

The camera center is an intrinsic parameter, so under ideal conditions, the center of the same camera wouldn't change.

Exactly. Also when playing with Arkit, you will see that camera intrinsics are changing for ALL images as they compute their own self calibration so I advice to have intrinsics for every frame for Iphone data...

I suspect my confusion may stem from the use of multiple cameras in 3DGS

I see! Here since it's a SLAM method, we often assume a video sequence so it's the same camera within a short timeframe (minutes) so we can assume that camera parameters don't change too much + it's easier to build an optimisation problem with one camera as it reduces the number of parameters to optimise.

Please allow me to express my gratitude once again, your explanation is very clear

My pleasure! Thank you for the kind words

whwh747 commented 6 months ago

thank you so much,i will close this issue.