szymanowiczs / splatter-image

Official implementation of `Splatter Image: Ultra-Fast Single-View 3D Reconstruction' CVPR 2024
https://szymanowiczs.github.io/splatter-image
BSD 3-Clause "New" or "Revised" License
795 stars 54 forks source link

Question about gaussian prediction #28

Open Pixie8888 opened 5 months ago

Pixie8888 commented 5 months ago

Dear author,

Thank you very much for sharing code! I have some question about gaussian prediction:

  1. Does the network try to predict ground-truth depth for each foreground pixel? In other words, only the location of each gaussian close the ground truth depth can the rendering be good?
  2. Why is the pos normalized in line 786 shown below? image
johnren-code commented 5 months ago

Same quesion, have you conducted any experiments to validate your ideas?

huoxingdawang commented 4 months ago

Hello, To my understant, pos is caclucated using depth:

https://github.com/szymanowiczs/splatter-image/blob/98b465731c3273bf8f42a747d1b6ce1a93faf3d6/scene/gaussian_predictor.py#L773

And in the get_pos_from_network_output, you can see that depth_act(depth_network) , which is between 0 to 1, is converted to the depth in the same units as the camera pose by znear and zfar:

https://github.com/szymanowiczs/splatter-image/blob/98b465731c3273bf8f42a747d1b6ce1a93faf3d6/scene/gaussian_predictor.py#L706

And then depth is converted to pos, then the pos is in the same units as the camera pose :

https://github.com/szymanowiczs/splatter-image/blob/98b465731c3273bf8f42a747d1b6ce1a93faf3d6/scene/gaussian_predictor.py#L710

So, in my opinion, if the argument znear and zfar is not set correctly, the rendering will be bad.

Then for the line 786:

The line 782 to 784 added a new dim to the pos, so now the pos is [x,y,z,1] rather than [x,y,z] and this is because source_cameras_view_to_world in line 785 is a 4x4 rotation + translation matrix, there needs a extra 1 to allow two matrices to be multiplied.

After the line 785, the [x,y,z,1] need to be coverted back to [x,y,z] . However, after multiplied, the [x,y,z,1] is not [x,y,z,1], it can be [x,y,z,w] and w is not 1, so [x,y,z] need to be [x/w,y/w,z/w] , it is not "normalized".