graphdeco-inria / gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Other
14.16k stars 1.83k forks source link

Ask some questions for the paper #67

Closed Bin-ze closed 1 year ago

Bin-ze commented 1 year ago

Thank you so much for your great work! I have read the paper, but there are some things I don't understand, I would like to ask:

  1. How to predict opacity ๐›ผ? For each 3d Gaussian, can it be obtained through an activation function through a learnable parameter?
  2. The abstract part mentioned: "preserve desirable properties of continuous volumetric radiance fields for scene optimization", how to understand the appropriate relationship?
  3. Chapter 2.2 of the thesis mentioned: "The use of MVS-based geometry is a major drawback of most of these methods", how to understand MVS-based geometry? Is it a sparse point cloud?
  4. Chapter 2.3 of the thesis mentions: "Our rasterization respects visibility order in contrast to their order-independent method." How to understand visibility order?
  5. What is fast ๐›ผ-blending? I am new to computer graphics
  6. Why can covariance be described by R matrix and S matrix?
  7. Chapter 2.3 of the thesis mentions: "computes the color ๐ถ of a pixel by blending N ordered points overlapping the pixel." How to understand N? Can it be understood as the amount of Gaussian overlap in the direction of a ray passing through a pixel?
  8. 3d Gaussian is used as the basic element, so what is the physical meaning of the output value of 3d Gaussian? Can it be understood as a probability distribution to describe the opacity probability centered on this point?
  9. Paper 5.1 mentioned: "Inevitably, geometry may be incorrectly placed due to the ambiguities of 3D to 2D projection." What does ambiguities refer to here? I think 3d to 2d is a definite process, does it refer to the error caused by floating point precision, or the 3d points are collinear?
  10. Paper 5.1 mentioned: "An effective way to moderate the increase in the number of Gaussians is to set the ๐›ผ value close to zero every ๐‘ = 3000 iterations", ๐›ผ means opacity, learned by the network, here every 3000 steps Set it to 0, is to reinitialize directly? So relearn ๐›ผ? I checked the code. But can't understand why it does this.

Sorry for asking so many questions, looking forward to your reply๏ผ

grgkopanas commented 1 year ago

Thank you so much for your great work! I have read the paper, but there are some things I don't understand, I would like to ask:

  1. How to predict opacity ๐›ผ? For each 3d Gaussian, can it be obtained through an activation function through a learnable parameter?

There is no prediction we optimize the alpha as every other property of the gaussians, this is just using gradient descent over all the parameters of all the gaussians including alpha such that it minimizes the loss.

  1. The abstract part mentioned: "preserve desirable properties of continuous volumetric radiance fields for scene optimization", how to understand the appropriate relationship?

This is a rather abstract question

  1. Chapter 2.2 of the thesis mentioned: "The use of MVS-based geometry is a major drawback of most of these methods", how to understand MVS-based geometry? Is it a sparse point cloud?

MVS usually means any geometry after the dense correspondance and outlier clean up of multi-view stereo, this often is a 3d mesh but could also be a dense point cloud. its a drawback because often is polluted by many errors

  1. Chapter 2.3 of the thesis mentions: "Our rasterization respects visibility order in contrast to their order-independent method." How to understand visibility order?

this just tries to explain that because of alpha blending order and visibility is respected in contrast to order-independed transparency that does a weighted average using the inverse depth as a weighting factor to give priority to the front-most points

  1. What is fast ๐›ผ-blending? I am new to computer graphics

not sure where this is referred but i would take a wild guess and say that its nothing too fancy, just a fast implementation of alpha blending

  1. Why can covariance be described by R matrix and S matrix?

I would suggest you to read more carefully the paper in this matter and try to think about it from a linear algebra perspective. Try to think what is a covariance matrix what properties it has and convince yourself why RSR^TS^T is always a covariance matrix

  1. Chapter 2.3 of the thesis mentions: "computes the color ๐ถ of a pixel by blending N ordered points overlapping the pixel." How to understand N? Can it be understood as the amount of Gaussian overlap in the direction of a ray passing through a pixel?

Yes

  1. 3d Gaussian is used as the basic element, so what is the physical meaning of the output value of 3d Gaussian? Can it be understood as a probability distribution to describe the opacity probability centered on this point?

Yes it could, but not exactly, mostly because it won't integrate to 1, we skip the normalisation step with the determinant to allow big opaque gaussians

  1. Paper 5.1 mentioned: "Inevitably, geometry may be incorrectly placed due to the ambiguities of 3D to 2D projection." What does ambiguities refer to here? I think 3d to 2d is a definite process, does it refer to the error caused by floating point precision, or the 3d points are collinear?

3d to 2d is definite you are right maybe this is not the most successful phrasing but it tries to say that extracting 3d geometry from 2d projection is ambiguous

  1. Paper 5.1 mentioned: "An effective way to moderate the increase in the number of Gaussians is to set the ๐›ผ value close to zero every ๐‘ = 3000 iterations", ๐›ผ means opacity, learned by the network, here every 3000 steps Set it to 0, is to initialise directly? So relearn ๐›ผ? I checked the code. But can't understand why it does this.

First of all not everything is networks :) Because we do gradient descent doesn't mean there is a network somewhere. This is a rather unimportant trick that we saw that sometimes helps in two ways: first when we reset alpha the optimisation will only increase the alpha of the gaussians that are rather necessary and it gives the opportunity to prune the gaussians that stay with low alpha, second often when floaters appear images get stuck in local minima because rays terminate early on the floaters and there is not chance for the optimisation to see that behind we have a perfectly reconstructed scene, resetting all alpha values allows for a small window of opportunity to the optimisation to converge to a better local minima by removing the floater.

Sorry for asking so many questions, looking forward to your reply๏ผ

Best, George

Bin-ze commented 1 year ago

Thank you very much for your wonderful reply, I still have some questions to ask:

  1. For the rendering formula (3) in Section 2.3, is there a one-to-one correspondence between $c_i$ and ๐›ผ_i? For example, for each sparse point, it corresponds to a Gaussian with the point as the mean, and corresponds to an opacity ๐›ผ and a color c. If so, how is the color calculated for locations on the 2d image where there are no projected points? Is it to directly regard the 2d pixels within the ellipse represented by the Gaussian distribution as the color, and then use the opacity to weight it?
  2. Does the algorithm not require ray marching? Just need to project a 3d Gaussian within the viewing frustum to a 2d plane and then do the blending?
  3. What if SfM Points were more precise or dense? Can better results be obtained?
  4. Is the algorithm independent of scene size? If scene size affects the result, what are the most likely factors? Such as the number of points, or the number of optimizations?

Best๏ผŒ binze

Snosixtyboo commented 1 year ago
  1. Yes
  2. Yes
  3. Most likely, yes!
  4. It is independent of scene SIZE, but not of scene COMPLEXITY. If you scale up the same scene (and cameras) spatially, things should still work the same. But if you try to do an entire city district instead of a couple of objects, the spatial learning rate will probably be too high (see FAQ) and will probably need to be adapted.

Hth, Bernhard

Bin-ze commented 1 year ago

thank you for your reply