Open Yuhuoo opened 6 months ago
Hi @Yuhuoo, thank you for your question. Could you give more details about the dataset you used? How many images are there? Are the cameras sparse?
Here is my initial guess: I think your dataset might be small or the cameras are very sparse. This means the distance between groups of actual cameras is large, and DUST3R might not estimate camera parameters correctly. You see noisy Gaussians because this function generates new camera parameters that weren’t used in training. It seems like the Gaussians have overfitted on the gt camera parameters and haven’t generalized to new camera views, which is why you see such noisy reconstruction. In simple terms, you might need to add more intermediate images (make the dataset a little bit dense: by adding more images) for better reconstruction
Thank you for your reply. I wanted to ask if you have modified the code of GaussianSplatting?
It's strange that the results obtained in this project differ from the original GaussianSplatting project https://github.com/graphdeco-inria/gaussian-splatting after training, but I used the same data—point cloud and camera poses obtained from dust3r.
Why does self.world_view_transform
need to be multiplied by 10000?
@Yuhuoo did you manage to figure out why the multiplication to 10000
is needed? We also have a similar issue with a noisy reconstruction.
I've done it to overcome the issue of numerical instability. Firstly, there's no need for concern, as the values are normalized before being used in the render function.
The challenge arises because Dust3r outputs camera parameters in a very small range and there are values super close to 0, and these sometimes saturate to 0 after normalization before rendering. By multiplying them by 10000, the values increase and, post-normalization, they nearly reach but do not hit zero. Without this adjustment, artifacts appear in the rendered images. You may comment out this line; theoretically, it shouldn't affect the classical Gaussian splatting pipeline.
Thanks @nerlfield
Why does this function generate such high noise in the camera pose?
The first image below is the result rendered from the training dataset, while the second one is a frame corresponding of the video rendered.