Open sta105 opened 11 months ago
Sometimes there is enormous numerical error due to two facts:
atomicAdd
, the result varies a lot depending on the order of the addition. Like a+b+c
can yield very different result than c+b+a
depending on the values. This is something we cannot control.So even if you run the backward function with the same inputs, the results are NOT even guaranteed to be the same (or even close) each time. I think it's hard to check using the current code, unless you rewrite the implementation and remove all above numerical instabilities (which is not obvious to me)
Hello, I met the same problem when I change the method. BTW, when I use torch.autograd.gradcheck, I found the backward will be invoked more than forward. At the same time, when I use torch.autograd.gradcheck, it takes me about 2 days to run through the code saying "Backward is not reentrant, i,e, running backward with same input and grad_out multiple times give different values, although analytical gradient matches numericall gradient. The tolerance for onodeterminism was 0.0". I am confused if it is normal to take so much time for torch.autograd.gradcheck after all I use RTX4090. @kwea123 @sta105
Hello,
I am trying to improve the gaussian rendering code. However, when I use the torch.autograd.gradcheck to verify the correctness of the gradient computed by the cuda code, all the analytical gradients are so different from the numerical gradients (the same issue also occurs in the original implementation of diff_gaussian_rasterization). Is it any way to check if the backward part of the gaussian rendering is correct? Any ideas are appreciated.