noahzn / Lite-Mono

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
MIT License
527 stars 58 forks source link

sorry to bothering you #150

Closed QingTianNNN closed 1 month ago

QingTianNNN commented 1 month ago

I apologize for bothering you. Could you explain what ‘local information’ and ‘global information’ mean in the context of local-global feature interaction?

noahzn commented 1 month ago

Hi, using CNNs to extract features have limited receptive fields and the information focuses on local regions to learn edges, textures, etc. The LGFI module allows models to understand both fine details and overall context. Vision transformers are used to extract global information on the feature maps.