lpiccinelli-eth / UniDepth

Universal Monocular Metric Depth Estimation
Other
459 stars 38 forks source link

why use exp? #17

Open yamoomd opened 2 months ago

yamoomd commented 2 months ago

Thank you for your work! I found in your code that your predictions for fx and fy as well as depth are done with .exp(), I want to understand why you are considering using exp?

lpiccinelli-eth commented 2 months ago

There is no particular reason apart from the fact that those are quantities in [0, +inf) interval. Therefore, their distribution fits better a LogNormal distribution. If the network outputs the log of those quantities, the network is trying to regress normally distributed values.

I find it more natural for the network also due to its weights' initialization. Moreover, you are everywhere differentiable in the output and have strictly positive values, if you use ReLU you sacrifice the former to have the latter. In addition, the typical SILog loss is computed in the log space anyway.

However, it has downsides like over/underflowing (you can solve it with softplus), also the lognormal assumption (for depth) is not actually fully backed up by empirical evidence, especially for outdoor cases.

To be honest, I think that the function chosen (Relu, softplus, exp...) has an impact only when considered in combination with the chosen loss.

Let me know if this has answered your question!

yamoomd commented 2 months ago

Thanks for the answer!And I'd like to know how you came up with the idea of using SHE to generate camera embeddings?

lpiccinelli-eth commented 2 months ago

We represent the camera as a field of 3D vectors on a unit sphere, thus, imho, the natural and more mathematically correct embedding is the SH embedding compared to, e.g. Fourier-based embedding.

As before, we tried Fourier embeddings and the differences were not particularly considerable.