Questions about center depth loss in the paper

pals-ttic / sjc

Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation (CVPR 2023)

https://pals.ttic.edu/p/score-jacobian-chaining

Other

504 stars 15 forks source link

Questions about center depth loss in the paper #21

Closed thuliu-yt16 closed 1 year ago

thuliu-yt16 commented 1 year ago

The equation gives NaN when

$$ \frac 1 {|\mathcal{B}|}\sum{p\in\mathcal{B}}D(p) < \frac 1 {|\mathcal{B}^\complement|} \sum{q\notin\mathcal{B}} D(q) $$

Does that mean the loss is only applied when ?

$$ \frac 1 {|\mathcal{B}|}\sum{p\in\mathcal{B}}D(p) > \frac 1 {|\mathcal{B}^\complement|} \sum{q\notin\mathcal{B}} D(q) $$

But the loss encourages the average center depth to be large, which means you hope the object to be away from the scene center. Where did I go wrong?

w-hc commented 1 year ago

Very sry and thx for the catch. This is a mistake. The bug was undetected because the backward function of log in pytorch (intentionally) does not check the sign of the input. Therefore the mistakes cancel out and the resulting gradient is correct.

thuliu-yt16 commented 1 year ago

Therefore, I agree that the current implementation would make the gradient correct when the center is closer than the rest. However, when the center is further away, the loss pushes the center depth greater, making the object away from the scene center.

Is it because the center depth is always smaller when initialized? And in the whole optimization process, the input to torch.log keeps negative. If that's the case, then I guess the negative sign before log should be removed when the log input is positive?

thuliu-yt16 commented 1 year ago

Something like

    depth_loss = torch.sign(depth_diff) * torch.log(depth_diff + 1e-12)
    depth_loss = depth_weight * depth_loss

w-hc commented 1 year ago

Here the volume density is initialized to (almost) 0 everywhere in the scene, and the background distance is set to 10.0 for all rays. At the beginning each ray would have a depth of 10. The little offset of 1e-12 prevents explosion. It would rely on the other terms to create density at the center.

You are right that we should conditionally check not to apply the loss when their difference is too small. Flipping the sign is also possible. Also in hindsight, not shifting the log and relying on large gradients is not good. We should probably also shift by 1 and use scaling if we want larger gradients. We will update the writing and the code. Thanks for the help.

thuliu-yt16 commented 1 year ago

Thank you for your explanation!