ingra14m / Deformable-3D-Gaussians

[CVPR 2024] Official implementation of "Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction"
https://ingra14m.github.io/Deformable-Gaussians/
MIT License
952 stars 55 forks source link

Depth Gradient Derivation #35

Open sairisheek opened 9 months ago

sairisheek commented 9 months ago

Hello, thank you for the great work and congratulations on your paper acceptance!

I was wondering how you derived the gradient for the depth from the camera plane. In particular this part:

        // the w must be equal to 1 for view^T * [x,y,z,1]
    float3 m_view = transformPoint4x3(m, view);

    // Compute loss gradient w.r.t. 3D means due to gradients of depth
    // from rendering procedure
    glm::vec3 dL_dmean2;
    float mul3 = view[2] * m.x + view[6] * m.y + view[10] * m.z + view[14];
    dL_dmean2.x = (view[2] - view[3] * mul3) * dL_ddepth[idx];
    dL_dmean2.y = (view[6] - view[7] * mul3) * dL_ddepth[idx];
    dL_dmean2.z = (view[10] - view[11] * mul3) * dL_ddepth[idx];

    // That's the third part of the mean gradient.
    dL_dmeans[idx] += dL_dmean2; 

Per my understanding, the depth that is calculated in the forward pass is simply the p_view.z that is calculated in

in_frustum(idx, orig_points, viewmatrix, projmatrix, prefiltered, p_view) 

In this method (contained in auxiliary.h) the world-space 3D coordinate undergoes a rigid transform to the view space 3D coordinate. In that case, wouldn't the gradient just be the coefficients that map the world-space x,y,z to the view z-coordinate in the view matrix? Like:

//aggregate depth gradients
    dL_dmean2.x = view[2] * dL_ddepth[idx];
    dL_dmean2.y = view[6] * dL_ddepth[idx];
    dL_dmean2.z = view[10] * dL_ddepth[idx];

Is there something I'm missing here? I'm simply trying to reverse engineer whatever happened in the forward pass. Your help is much appreciated!

ingra14m commented 9 months ago

Hi, thanks for your interest.

Actually, in Deformable-GS, we only used the main branch of depth-diff-gaussian-rasterization, meaning we only have a forward pass for depth.

Interestingly, in our experiments, depth supervision actually led to negative optimization. In more general experiments with vanilla 3D-GS, we also found that improvements in 3D-GS geometry did not lead to an enhancement in rendering quality.