sxyu / svox2

Plenoxels: Radiance Fields without Neural Networks
BSD 2-Clause "Simplified" License
2.79k stars 360 forks source link

How is the initial guess determined for the non-linear optimization? #87

Closed jasjuang closed 1 year ago

jasjuang commented 1 year ago

Hi,

I have one question after reading the paper. How are the initial values of the density + spherical harmonics coefficient for each voxel determined before running the non-linear optimization? I can't seem to find it mentioned in the paper. Based on conventional wisdom, the initial guess has to be close enough, otherwise the non-linear optimization will be stuck at a bad local minimum. One classical example is the camera calibration, where the extrinsics has to be calculated via solvePnP initially before going into the non-linear optimizer, otherwise the calibration result will be far from accurate.

Thanks, Jason

sarafridov commented 1 year ago

We use a constant initialization with a small positive value (0.1) for density and zero for the spherical harmonic coefficients.

jasjuang commented 1 year ago

Can you provide insights on why running the non-linear optimization can still yield good results when the constant initialization you mentioned is obviously very far from the solution? How does the non-linear optimization avoid being stuck in a bad local minimum?

sarafridov commented 1 year ago

Most of modern machine learning is based on running nonconvex optimization problems from random (or, in our case, constant) initializations and getting decent, if not necessarily optimal, solutions. If we know a problem is convex then it can certainly be optimized, but lack of convexity doesn't guarantee failure. In this case, our objective function is nonconvex but is still sufficiently well-behaved that the optimization produces good results.

One other comment about our situation specifically is that the structure of the problem has a strong influence on how easily it can be optimized. In particular, we compared nearest neighbor interpolation to trilinear interpolation and found that when we use nearest neighbor interpolation the optimization does get stuck in a local minimum, but the optimization is better when we use trilinear interpolation. We believe this is because the nearest neighbor interpolation produces a discontinuous function (at voxel boundaries), making gradient optimization difficult.

jasjuang commented 1 year ago

Modern machine learning that uses neural networks to brute force approximate non-convex optimization problems can have random initialization of the weights because conceptually the weights represents both the parameters and the objective function. However, plenoxels does not use neural networks, it uses classical non-linear optimization where the objective function is handcrafted and fixed. Therefore, how the parameters are initialized should matter.

Can you elaborate and quantify on what you mean by sufficiently well-behaved for your objective function?

sarafridov commented 1 year ago

It's true that we don't use neural networks, but for both Plenoxel optimization and neural net optimization (a) the local structure of the loss landscape depends on the current values of the parameters and (b) the objective function that mathematically combines the parameters into a loss value is fixed (for a neural net, this would be set by the architecture). The main difference is that the parameters in Plenoxels are allocated a bit differently than in a neural network, and as such they are able to take on physical meaning (e.g. a single parameter denotes the density of a specific point in space).