Understanding the difference between different models released by MIDAS

isl-org / MiDaS

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"

MIT License

4.51k stars 624 forks source link

Understanding the difference between different models released by MIDAS #109

Open rahulmoorthy19 opened 3 years ago

rahulmoorthy19 commented 3 years ago

Thank You for the great work. I just wanted to understand one small thing- I see that MIDAS v3.0 are based upon different visual Transformers based architectures with the novel loss function proposed for different dataset based learning. I just wanted to understand How are MIDAS v2 small, v2 and v1 are designed and how are they different from DPT based v3 architectures. A reference paper for those architectures would be really helpful.

ranftlr commented 3 years ago

For v2/v2.1 standard size: this is essentially the architecture introduced here https://openaccess.thecvf.com/content_cvpr_2018/papers/Xian_Monocular_Relative_Depth_CVPR_2018_paper.pdf

For v2 small: there is no accompanying paper. We went for two major design decisions: what backbone to use (most of the compute happens in the backbone) and the with of the skip connections. We evaluated various choices and found that the one that is currently online met our speed requirements at good accuracy.