Anyway to check the backward implementation of the gaussian rendering?

graphdeco-inria / gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

Other

13.86k stars 1.79k forks source link

Anyway to check the backward implementation of the gaussian rendering? #423

Open sta105 opened 11 months ago

sta105 commented 11 months ago

Hello,

I am trying to improve the gaussian rendering code. However, when I use the torch.autograd.gradcheck to verify the correctness of the gradient computed by the cuda code, all the analytical gradients are so different from the numerical gradients (the same issue also occurs in the original implementation of diff_gaussian_rasterization). Is it any way to check if the backward part of the gaussian rendering is correct? Any ideas are appreciated.

kwea123 commented 11 months ago

Sometimes there is enormous numerical error due to two facts:

the inverse of cov2D is numerically unstable if cov2D has little determinant value
on atomicAdd, the result varies a lot depending on the order of the addition. Like a+b+c can yield very different result than c+b+a depending on the values. This is something we cannot control.

So even if you run the backward function with the same inputs, the results are NOT even guaranteed to be the same (or even close) each time. I think it's hard to check using the current code, unless you rewrite the implementation and remove all above numerical instabilities (which is not obvious to me)

LarkLeeOnePiece commented 6 months ago

Hello, I met the same problem when I change the method. BTW, when I use torch.autograd.gradcheck, I found the backward will be invoked more than forward. At the same time, when I use torch.autograd.gradcheck, it takes me about 2 days to run through the code saying "Backward is not reentrant, i,e, running backward with same input and grad_out multiple times give different values, although analytical gradient matches numericall gradient. The tolerance for onodeterminism was 0.0". I am confused if it is normal to take so much time for torch.autograd.gradcheck after all I use RTX4090. @kwea123 @sta105