Open rahulmoorthy19 opened 3 years ago
For v2/v2.1 standard size: this is essentially the architecture introduced here https://openaccess.thecvf.com/content_cvpr_2018/papers/Xian_Monocular_Relative_Depth_CVPR_2018_paper.pdf
For v2 small: there is no accompanying paper. We went for two major design decisions: what backbone to use (most of the compute happens in the backbone) and the with of the skip connections. We evaluated various choices and found that the one that is currently online met our speed requirements at good accuracy.
Thank You for the great work. I just wanted to understand one small thing- I see that MIDAS v3.0 are based upon different visual Transformers based architectures with the novel loss function proposed for different dataset based learning. I just wanted to understand How are MIDAS v2 small, v2 and v1 are designed and how are they different from DPT based v3 architectures. A reference paper for those architectures would be really helpful.