monniert / differentiable-blocksworld

[NeurIPS 2023] Code for "Differentiable Blocks World: Qualitative 3D Decomposition by Rendering Primitives"
https://www.tmonnier.com/DBW
MIT License
197 stars 10 forks source link

Alpha compositing differentiable rendering #6

Closed ktertikas closed 1 year ago

ktertikas commented 1 year ago

Hi!

First of all thank you for the great work and codebase! :smiley:

I wanted to ask you about the design choice of adding the transparency value of the primitives to the differentiable rendering process. Specifically, in the paper you mention that it behaves better during optimization in comparison to the standard differentiable rendering pipeline. What do you mean by better behavior? How worse are the results when using directly the standard Pytorch3D renderer, are there any examples that you could share showing the difference? Finally, do you have any intuition why this is happening?

Best, Konstantinos

monniert commented 1 year ago

Hi @ktertikas thanks for reaching out!

Compared to standard Pytorch3d mesh renderer, we modify two things:

1. Softmax blending > Alpha-compositing blending

We propose a new formulation in our previous work UNICORN on SVR because we observed that pytorch3d simply does not work when learning geometry from RGB rendering comparisons only (they use silhouettes to make it work). You can have a look at appendix A for why this is happening, but basically the standard softmax RGB blending formulation prevents gradient from flowing to the occupancy maps, which is necessary to update geometry. We raised an issue with working examples in pytorch3d repo showing that our alpha-compositing blending function make the learning work with RGB comparisons.

2. Adding mesh transparency values

We incorporate transparency values inside the blending function to be able to model a variable number of meshes in a differentiable manner, by simply setting some mesh transparency values to 0. This is our proposed solution to optimize over the number of meshes in a differentiable manner, other solutions typically imply reinforcement learning or greedy algorithms which are difficult to handle.

Let me know if this makes sense!

ktertikas commented 1 year ago

Hey @monniert , thanks for the great explanation!

I was not aware of the gradient issues on the current Pytorch3D Renderer, thanks for letting me know!

Re the transparency values, from my experience experimenting with parsimony losses similar to your proposed loss, the number of primitives that eventually contribute to the scene (alpha > 0.5) is quite noisy, and can even change significantly on different runs of the same scene (different seed per run). Did you experience something similar in your work?

Best, Konstantinos

monniert commented 1 year ago

That's a good point. From what I observed, there is indeed some noise in the number of primitives contributing to the scene but nothing unrealistic. For example for the BlendeddMVS scene of the egg, it can sometimes be reconstructed with 2 or 3 primitives, with one sphere at the bottom and another small ellipsoid to reconstruct the top.

However, as stated in the paper, we observed that the training rendering loss was a good proxy to automatically select the good run amond severals ones.

Closing the issue, feel free to reopen if needed!

ktertikas commented 1 year ago

Great, thanks for the help and the insights!