nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
9.55k stars 1.3k forks source link

Splatfacto new POV image quality #2863

Open pierremerriaux-leddartech opened 9 months ago

pierremerriaux-leddartech commented 9 months ago

Hi, It not really a bug, but more a question about splatfacto image quality/parameters. If it is not the right place, just let me know. I am working to reconstruct driving scene from pandaset. Camera poses are along a car trajectory. When I render an image from this trajectory (train or eval set) the quality is very good. Below a example of one image from eval set: image

But if I move a bit the target vehicle and the ego camera (1m on left and 0.5m up), the camera direction stays very close to the original one. image The quality decrease very quickly. The quality is very sensitive to direction. I tried to reduce sh degree from 3 to 1, to avoid to much overfitting, with no really improvement. Sure that a driving scene images is less dense, than a unique object scene with video sequence around it. But few months ago, I did it with nerfacto, the quality from initial poses was less, but really less sensible to new POV. Below a short video of nerfacto reconstruction from 5 cameras (front, side, and front-side) on the vehicle:

https://github.com/nerfstudio-project/nerfstudio/assets/42007976/8b347e4e-325c-401e-bed8-27896f160cc1

I improved my results with nerfacto by using camera pose opt. So I tried to do it for splatfacto by cherry pick viewmat gradient backward from gsplat branch https://github.com/nerfstudio-project/gsplat/tree/vickie/camera-grads. But for now, it didn't improve it. In the same way, if I tried with multi cameras (or just a side camera in place of the front), the quality is very less impressive. Below example with 5 cameras and cameras optimization (but I seem to not have a big influence). Left view: image Front view image

Do you have any idea, about what I should try firstly to improve new POV synthesis and secondly to improve multi cameras reconstruction? And did someone work one camera pose optimization for SplatFacto ?

And just for the fun, below an example of what we can do with objects and multiples splatFacto instance:

https://github.com/nerfstudio-project/nerfstudio/assets/42007976/8f85ddef-bb3c-481f-b0f5-8490ae53603a

Thanks for your inputs, I will go to implement depth loss from lidar to see if it help.

jb-ye commented 9 months ago

How did you calculate the poses for those cameras?

pierremerriaux-leddartech commented 9 months ago

Hi @jb-ye, I got it directly from the dataset. Cameras, lidar point clouds and objects are provided in world coordinate.

jb-ye commented 9 months ago

Hi @jb-ye, I got it directly from the dataset. Cameras, lidar point clouds and objects are provided in world coordinate.

How do you validate the accuracy of those data? Running Nerf/Gaussian Splatting model expects a much higher accuracy standard of pose estimation than most autonomous robot stacks. It probably wouldn't work at all if you use those pre-calculated poses.

pierremerriaux-leddartech commented 9 months ago

Hi @jb-ye, thanks for your message. No really way to validate the accuracy for camera pose. I only validated lidar frames accumulation in world referential, and it was pretty good. With nerffacto and camera opt activated, it worked pretty well on the same sequence. I tested camera opt with splatfacto, but not really improvement. I have mainly 2 interrogations:

I did this experimentation about camera pose optimization: https://github.com/nerfstudio-project/gsplat/issues/119

jb-ye commented 9 months ago

(1) I don't think camera opt would work with gaussian splatting by just back propagating gradients, it requires some non-trivial innovation. (2) the fact the shifting 1m and see significant quality decrease indicates the pose is not sufficiently accurate.

kerrj commented 9 months ago

Couple things you can try: 1) COLMAP the poses to get a sort of upper bound on quality 2) export the poses from a trained nerfacto model into splatfacto

We're working on camera backprop in gsplat, but not sure when it will be finished. Something you could do is use the pytorch implementation of project_gaussians this PR, which is slower than the CUDA version but would backprop gradients through the camera matrix. We haven't tested it much in 3D pose optimization though, and I'd expect there might need to be some work done on the nerfstudio side to make pose optimization work well (optimizer param tuning, maybe warmup on gaussians etc)

pierremerriaux-leddartech commented 9 months ago

Hi @kerrj and @jb-ye Thanks for your answers and your help.

thanks

lxzbg commented 9 months ago

(1) I don't think camera opt would work with gaussian splatting by just back propagating gradients, it requires some non-trivial innovation. (2) the fact the shifting 1m and see significant quality decrease indicates the pose is not sufficiently accurate.

@jb-ye "camera opt will requires some non-trivial innovation", I'm very interested in this question, can you tell me more about it? I thought camera opt didn't work because unlike nerfacto's pixel-level training, 3DGS trains on the image level.

kerrj commented 9 months ago

I haven't tested the pytorch implementation within splatfacto for camera optimization yet, but I'd be interested in what happens if you try! The gradients should be correct for camera optimization, but significantly slower than the CUDA version

Also, for all of these changes you redefined the camera optimizer inside splatfacto.py and use apply_to_camera inside get_outputs right?

pierremerriaux-leddartech commented 9 months ago

Hi @kerrj , Sure I have reinserted apply_to_camera and other stuffs. And I displayed also camera poses evolution during training. I will keep you in touch when we will test the pytorch implementation. thanks

jb-ye commented 9 months ago

(1) I don't think camera opt would work with gaussian splatting by just back propagating gradients, it requires some non-trivial innovation. (2) the fact the shifting 1m and see significant quality decrease indicates the pose is not sufficiently accurate.

@jb-ye "camera opt will requires some non-trivial innovation", I'm very interested in this question, can you tell me more about it? I thought camera opt didn't work because unlike nerfacto's pixel-level training, 3DGS trains on the image level.

You are right, 3DGS operates per image thus the gradient doesn't reflect cross frame consistency. Let's why I said it is a non-trivial work, and needs some fresh ideas.

MartinEthier commented 9 months ago

Couple things you can try:

  1. COLMAP the poses to get a sort of upper bound on quality
  2. export the poses from a trained nerfacto model into splatfacto

We're working on camera backprop in gsplat, but not sure when it will be finished. Something you could do is use the pytorch implementation of project_gaussians this PR, which is slower than the CUDA version but would backprop gradients through the camera matrix. We haven't tested it much in 3D pose optimization though, and I'd expect there might need to be some work done on the nerfstudio side to make pose optimization work well (optimizer param tuning, maybe warmup on gaussians etc)

@kerrj I am trying to get a proper evaluation setup while using pose optimization on my dataset. The poses come from a SLAM system so they're not as accurate as COLMAP. Before trying to implement test-time pose optimization, I figured a simpler idea would be to do what you suggested: train a nerfacto with pose optimization on a merged train and eval dataset, export the poses, and then train and eval models on the exported poses without pose optimization. However, when I then try training a model on the exported poses without pose optimization, I get worse train and eval performance than when I just train on the original dataset without pose optimization, which is not what I expect. Do you have any ideas on how to get this working? It also seems the final optimized poses are different for different models. I did a pose optimization run with nerfacto and nerfacto-big and the camera_opt_translation and camera_opt_rotation values they converged to differ by about 0.1 to 0.2

Nplace-su commented 9 months ago

@pierremerriaux-leddartech Hi, I wonder how did you add objects in the scene in your last video, it's like an implement of street-gaussians?

pierremerriaux-leddartech commented 9 months ago

@Nplace-su, yes we inspired from street gaussians

jb-ye commented 9 months ago

@MartinEthier It is possible that pose optimization may get worse. Things you can try:

(1) decrease the learning rate of poses and even play a bit more with the learning rate schedule. (2) Optimize poses may not be always deterministic or convergent, but we know there is only one possible global optimum for poses. Therefore, one has to do a determinism/convergence check and this is to me a non-trivial task.
(3) What you observe with nerfacto and nerfacto-big just shows that this is not a robust technique, one has to use it with cautions.

li199603 commented 5 months ago

(1) I don't think camera opt would work with gaussian splatting by just back propagating gradients, it requires some non-trivial innovation. (2) the fact the shifting 1m and see significant quality decrease indicates the pose is not sufficiently accurate.

@jb-ye "camera opt will requires some non-trivial innovation", I'm very interested in this question, can you tell me more about it? I thought camera opt didn't work because unlike nerfacto's pixel-level training, 3DGS trains on the image level.

You are right, 3DGS operates per image thus the gradient doesn't reflect cross frame consistency. Let's why I said it is a non-trivial work, and needs some fresh ideas.

3DGS always tries its best to fits the input images, even if the camera attitude is not accurate. In other words, 3dgs always creates local optimizations for camera optimization.

karthik101200 commented 4 months ago

@Nplace-su, yes we inspired from street gaussians

Hi a little late to the party. I have lidar odomtery and transformation between camrea and lidar in my ROS TF tree and it on paper should give better localization than COLMAP. I am saving this in the transforms.json after converting to OpenGL to bypass COLMAP but the results are much much worse. splattfacto gives some output with very bad depth but nerffacto (or any other nerfstudio nerf model) doesnt run at all. Is there a way to debug this issue according to you. its a custom dataset that I am generating from a rosbag thanks in advance