graphdeco-inria / gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
Other
14.18k stars 1.84k forks source link

Question about Gaussian normalization in the paper and alpha blending implementation in the code #294

Open KaziiBotashev opened 1 year ago

KaziiBotashev commented 1 year ago

Dear authors, thank you for this outstanding work.

I have some questions related to the alpha blending implementation in the code.

In the lines 336-359 of forward.cu , we do alpha blending with the following procedure:

float4 con_o = collected_conic_opacity[j];
float power = -0.5f * (con_o.x * d.x * d.x + con_o.z * d.y * d.y) - con_o.y * d.x * d.y;
if (power > 0.0f)
continue;

// Eq. (2) from 3D Gaussian splatting paper.
// Obtain alpha by multiplying with Gaussian opacity
// and its exponential falloff from mean.
// Avoid numerical instabilities (see paper appendix). 
float alpha = min(0.99f, con_o.w * exp(power));
if (alpha < 1.0f / 255.0f)
continue;
float test_T = T * (1 - alpha);
if (test_T < 0.0001f)
{
done = true;
continue;
}

// Eq. (3) from 3D Gaussian splatting paper.
for (int ch = 0; ch < CHANNELS; ch++)
C[ch] += features[collected_id[j] * CHANNELS + ch] * alpha * T;
T = test_T;

Following EWA splatting paper the final C[ch] is equivalent to this (ommiting low-pass filter): image with following: image and following: image

It seems to me that in order to compute the final color value, we also need to multiply it with the normalization factor, which is the multiplication of the determinants of the Jacobian, camera rotation (the rotation one is identity because of orthonormality), and the square root of the covariance matrix image. If I do this, I will get just the square root of the Vk (world reference frame) matrix.

However, in the code, I can't find any of these determinants or related multiplications either in forward or backward processes, we only use exponential part without normalization and it confuses me a lot. Jacobian is not a constant value; it actually depends on the positions (3D means) of our gaussians, so we can't just simply omit it as well as det(Vk), which is our direct optimization parameter.

I would be very grateful if you could clarify either where we do that part or why we don't need to do it.

Thank you in advance!

KaziiBotashev commented 1 year ago

Dear @grgkopanas,

Could you, please, take a look on that question? Many thanks in advance!

grgkopanas commented 1 year ago

We have our best guy looking at it :) Its indeed an interesting observation

f-dy commented 1 year ago

Normalization and alpha play the same role in the equations, so you can think of alpha as "normalization*the_real_alpha". I actually prefer not having the normalization term (as it is now), because the Gaussians are not the result of blurring a Dirac: I see them as a "mass of stuff". If there was the normalization term, large Gaussian would have to have an alpha value way larger than 1, which makes little sense. I prefer to see the Gaussians as blobs, where alpha is the transparency at the center.

slefkimmiatis commented 1 year ago

Normalization and alpha play the same role in the equations, so you can think of alpha as "normalization*the_real_alpha". I actually prefer not having the normalization term (as it is now), because the Gaussians are not the result of blurring a Dirac: I see them as a "mass of stuff". If there was the normalization term, large Gaussian would have to have an alpha value way larger than 1, which makes little sense. I prefer to see the Gaussians as blobs, where alpha is the transparency at the center.

I am not sure that the normalization term that @KaziiBotashev refers to be can be absorbed by the alpha parameter (that would indeed be very convenient in terms of implementation simplicity). The reason for this, unless I am wrong, is that the opacity of the volume is independent of the camera view, while the normalization term directly depends on it through the Jacobian J_k which internally involves the camera rotation and translation.

@grgkopanas do you have any updates that you could share with us about this issue?

KaziiBotashev commented 1 year ago

@f-dy If there is the normalization term, large Gaussian would have to have an normalization term value way larger than 1. That effect might be compensated by "the real alpha" learned value (for large gaussians we will have large normalization term and small real alpha value trained). Can you, please, elaborate a bit more on why it makes little sense?

adam-ce commented 1 year ago

regarding the normalisation of the gaussian based on the det(covariance): to my understanding, mathematically it makes no difference, it can be baked into alpha. numerically, it might make a difference, i don't know. performance wise it's faster to not compute the normalisation. but that's done in the preprocess phase, so it should not really matter.

regarding the normalisation based on the jakobian: to my understanding it boils down to keeping the integral of the transformed gaussian the same as the untransformed one. if the gaussian (or its 1 sigma isoellipsoid) becomes larger, the scaling factor is < 1. so let's say we have a gaussian with alpha 1.0 -> completely opaque. let's now say we are closing in on that gaussian. it'll become stretched eventually. if using the jakobian normalisation, it would become transparent. without it'll stay opaque. and the authors apparently decided to keep it opaque. that's at least my theory. i still don't understand all of it completely.

f-dy commented 1 year ago

OK I found one place where normalization consideration is missing, this is where the 2D Gaussian is convolved with an isotropic 2D Gaussian of sigma sqrt(0.3) to simulate pixel integration (this is not in the paper):

Let us say you have a 2D Gaussian with an opacity of 1 at the center. When doing a convolution with another 2D Gaussian, if the opacity is currently left unchanged the Gaussian will become larger while remaining opaque and may obscure Gaussians that are behind (we observed that on grid patterns).

Take an extreme case where the original Gaussian has size 0.1 and opacity 1, and we blur it with a Gaussian of sigma 10. The result is a Gaussian with sigma=sqrt(0.1^2+10^2), but the opacity shouldn't be 1!

Instead, the opacity should be reduced so that the integrated opacity of the resulting Gaussian is the same as the original one. Thus in the 3DGS code the opacity should be multiplied by the factor sqrt(det(Sigma)/det(Sigma+diag(0.3,0.3))). @grgkopanas

In the above example, the factor (and thus the final opacity at the center of the 2D Gaussian) would be sqrt(0.1^4/10.1^4) = 0.0001.

Snosixtyboo commented 1 year ago

OK I found one place where normalization consideration is missing, this is where the 2D Gaussian is convolved with an isotropic 2D Gaussian of sigma sqrt(0.3) to simulate pixel integration (this is not in the paper):

That is entirely correct!

We discovered this some time ago. We tested it with and without proper compensation, but we found it has no measurable impact on image quality according to standard metrics. So we left it the same way it was used for the paper evaluation.

Hth, Bernhard

jb-ye commented 1 year ago

OK I found one place where normalization consideration is missing, this is where the 2D Gaussian is convolved with an isotropic 2D Gaussian of sigma sqrt(0.3) to simulate pixel integration (this is not in the paper):

That is entirely correct!

We discovered this some time ago. We tested it with and without proper compensation, but we found it has no measurable impact on image quality according to standard metrics. So we left it the same way it was used for the paper evaluation.

Hth, Bernhard

The impact on standard metrics may be small because the validation images are selected to render at similar distance from training images. If you captured data at distance 0.5m and render them at 2m, or using different focal length, the effect could be obvious.

f-dy commented 1 year ago

+1 we've seen a very visible impact when rendering from a different distance

f-dy commented 12 months ago

So we left it the same way it was used for the paper evaluation.

Hi Bernhard @Snosixtyboo, would it be possible to have that at least as an option? I'm not a CUDA expert, and find it difficult to compute a scalar here and use it somewhere else, but since you did it before, could you share the solution?

jb-ye commented 12 months ago

Here are two videos for the demonstrated effect as mentioned by @f-dy The first video was just do 2D convolution without compensation of opacity, when render camera moves from far to close (near the captured distance), we observe the color on the grid pattern of acoustic amplifier changes and creates aliasing like effect (though it is not aliasing). The second video was 2D convolution with compensation of opacity.

The demonstration was done using a third party implementation (https://github.com/wanmeihuali/taichi_3d_gaussian_splatting/tree/main/taichi_3d_gaussian_splatting).

https://github.com/graphdeco-inria/gaussian-splatting/assets/132313008/17a9e614-8a09-45cc-a2a4-59722bed17de

https://github.com/graphdeco-inria/gaussian-splatting/assets/132313008/2c0de817-3f23-4a27-a927-6f5318f2ca54

tdzdog commented 11 months ago

Another question about the alpha, I notice that the alpha formulation in equation (2) in the paper and the implementation in the code are different. It is 1 - exp in the paper but exp in the code. Can anyone explain the reasons? @KaziiBotashev @Snosixtyboo @grgkopanas

ys-koshelev commented 11 months ago

@tdzdog I believe the typo is not in the code, but in the paper, where in the Eq. 2 it should be $\alpha_i =\textrm{exp} \left(−\sigma_𝑖 \delta_𝑖 \right)$ instead of $\alpha_i = \left(1 − \textrm{exp} \left(−\sigma_𝑖 \delta_𝑖 \right) \right)$.

lxndrrss commented 10 months ago

@tdzdog @ys-koshelev Eq. 2 in the paper is definitely correct, $\alpha_i = (1-\text{exp}(-\sigma_i \delta_i))$ (The opacity $\alpha$ should get bigger for greater density $\sigma$ or interval $\delta$). But this is not done in the code at all, as Eq. 2 describes the raymarching approach of NeRF-like volumetric representations. In Gaussian Splatting each Gaussian stores the opacity $\alpha$ directly. The exp(power) in line 343 of forward.cu (that I assume @tdzdog is referring to) represents the evaluation of the projected 2D Gaussian.