zhengzhang01 / Pixel-GS

[ECCV 2024] Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Other
131 stars 10 forks source link

How to set the depth threshold #1

Open AIBluefisher opened 5 months ago

AIBluefisher commented 5 months ago

Hi, @zhengzhang01

I wonder how you set to obtain the hyperparameter $\gamma_{\text{depth}}$ in the gradient scaling factor. In the paper, you mentioned you did this through an experiment. However, for different scenes, how can we obtain a roughly reliable value without exhaustive experiments?

zhengzhang01 commented 5 months ago

In our paper, we apply the same $\gamma{\text{depth}}$, set at 0.37, to all scenes discussed (a total of 30 scenes, including 9 from Mip-NeRF360 and 21 from Tanks & Temples). The current setting of $\gamma{\text{depth}}$ is generalized and can be used for almost all scenes because it does not depend on the scene scale, which is instead specified by $\mathrm{radius}$.

For different scenes, we use $\mathrm{radius}$ to represent the scene scale, calculated according to Equation (9) in the paper. As illustrated below, $\mathrm{radius}$ is defined as 1.1 times the distance from the center point O of all cameras to the furthest camera. cam For each viewpoint, we scale the gradients used for "split" or "clone" the Gaussians that are within a distance of $\gamma \mathrm{depth}\times \mathrm{radius}$ from the camera. The scaling factor used is $\left( \frac{\mathrm{depth}}{\gamma \mathrm{depth}\times \mathrm{radius}} \right) ^2$. This is also the coefficient calculated by Equation (10) in the paper.

Therefore, our hyperparameter for gradient scaling, $\gamma_{\text{depth}} = 0.37$, should be applicable to almost all scenes, as it is independent of the scene scale $\mathrm{radius}$.

AIBluefisher commented 5 months ago

Thanks for your detailed explanation. But it looks like the value is applied to object-centric scenes. I'm not sure whether the same value of 0.37 can be applied to other scenes. For example, for aerial images, most cameras point toward the ground plane. Then, this value should be adjusted according to the height of the drones instead of the radius of scenes? Is that correct?

AIBluefisher commented 5 months ago

Intuitively, this threshold encourages farther points whose depths are greater than the threshold to have larger gradients and closer points to have smaller gradients. I think this parameter should be tuned according to the scenes. This can be done automatically by traversing the depth of sparse points.

zhengzhang01 commented 5 months ago

Thank you for your response!

In the original 3DGS, the learning rate for xyz is also scaled by $\mathrm{radius}$, which means that in the original 3DGS, $\mathrm{radius}$ is understood to represent the scene scale. If the original 3DGS can optimize a scene well, our method should work similarly. Regarding drone aerial scenes, like those proposed in the Mega-NeRF dataset, previously found that the original 3DGS achieved excellent reconstruction results. This indicates that using $\mathrm{radius}$ as a scene scale is a reasonable approach for setting the xyz learning rates in drone aerial scenes.

Therefore, using $\mathrm{radius}$ to represent the scene scale is reasonable for drone aerial scenes, and our method should also be expected to work effectively in these settings.

However, for certain scenarios, using the depth values of points to represent scene scale is indeed a more rational choice, such as when the camera position is fixed and it rotates around itself. One issue with using depth values to represent scene scale is that using the maximum depth value can lead to overestimation influenced by particularly distant points. For these types of scenarios, applying methods like KNN to even out the point cloud distribution initially, and then using the mean depth value to represent scene scale, might be a more sensible approach.

Additionally, our method does not amplify the gradients of points that are far from the camera; it only reduces the gradients of points that are close to the camera. This approach helps to suppress the occurrence of many "floaters" near the camera, which are often generated through "split" or "clone" processes.