maturk / dn-splatter

DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
https://maturk.github.io/dn-splatter/
Apache License 2.0
305 stars 17 forks source link

common interface of monodepth estimations #7

Open hardikdava opened 2 months ago

hardikdava commented 2 months ago

Great work, I see there are many interfaces which computes the monodepths from trained model. It would be good to have a single intrerface for monocular depth estimation. Let me know if I can help with anything. I am willing to contribute to the project.

maturk commented 2 months ago

Hi @hardikdava, you are correct, there is some legacy code that can be misleading. Let me try to explain here some of the main code.

python dn_splatter/scripts/depth_from_pretrain.py is used to obtain only monocular depth estimates (in this case from Zoe depth model). This takes a directory of images or a transforms.json file and stores the output monocular depth estimates into a folder called dataset_root/mono_depth with the naming of the original rgb images as .npy files like so: mono_depth/rgb_img_name.npy

On the other-hand python dn_splatter/scripts/depth_align.py tries to do a bit more advanced monocular depth estimate. It generates monocular depth estimates (like above) and also SfM sparse depth maps from a COLMAP database file. In addition, the script combines the monocular depth estimates and SfM sparse depth maps by solving for a scale alignment parameter per frame. These are saved in dataset_root/mono_depth with the _aligned.npy extension to distinguish from the original monocular depth estimates. This is a similar idea as in the work https://arxiv.org/abs/2311.13398, but instead of gradient descent, we use the closed-form linear solution to the least squares alignment problem. (although both methods are also supported). I am not entirely sure if the method in the above paper, and my implementation, are exactly the same; however, i suspect they should be similar. This is explained in Sec 4.1 in my arxiv submission and also in the above work. Note, the performance of monocular depth supervision compared to real sensor depth supervision is not equal. Monocular depth supervision can help in sparse-input views, but its impact on dense indoor captures is not comparable. This is a limitation of this work.

Let me know if you have any questions, and I totally agree, there are quite a few interfaces as well as some legacy code that makes readability and understanding difficult. I hope to extend and improve the code-base.

hardikdava commented 2 months ago

@maturk Thanks for the clarification. Now things are much clear.