lpiccinelli-eth / UniDepth

Universal Monocular Metric Depth Estimation
Other
604 stars 48 forks source link

Question about Laplace Spherical Harmonic Encoding (SHE) implementation in UniDepth #77

Open pnpmpnp opened 5 days ago

pnpmpnp commented 5 days ago

Hi @lpiccinelli-eth, First of all, thank you for sharing your valuable research.

While reviewing the code that implements the details of your paper, I came across a question. It seems that the SphHarm class at this link is not being used.

According to the paper, camera prediction is performed using imagery observation, and then Laplace Spherical Harmonic Encoding (SHE) is applied to create camera embeddings. Afterward, cross-attention is used to estimate depth.

Could you please clarify where this part is implemented in the code? Additionally, could you explain why the code might work even without applying the above-mentioned SHE?

lpiccinelli-eth commented 4 days ago

Hi @pnpmpnp, I appreciate your interest! To answer your question, we used the functional version of the spherical encoding for V1 - which is suggested to use right now. In particular, we used the 8th degree (to be honest it is a bit of an overkill, degree 3 is already fine) as you can see from line 18 of V1 decoder. If you have any other questions, do not hesitate to ask.

pnpmpnp commented 4 days ago

@lpiccinelli-eth, Thank you very much for your prompt reply. Your clear answer resolved the part of my question that I was unsure about.

Since I’m not very familiar with performing geometric embedding using techniques like spherical harmonics or Fourier transforms, I think I’m having difficulty fully understanding how this module works and its expected benefits. Could you possibly share any papers or resources that helped you gain intuition on this topic?