isl-org / DPT

Dense Prediction Transformers
MIT License
2.04k stars 260 forks source link

Title: Question about DPT model performance and network size adjustments #95

Open JunhyeongDoyle opened 2 months ago

JunhyeongDoyle commented 2 months ago

Hi, first of all, thank you for sharing the code and resources with the community! I’ve been experimenting with the four pretrained models provided in the repository to extract depth maps. While testing, I adjusted the network size parameters (net_h, net_w) and observed that increasing these values seemed to improve the detail in the depth estimation, especially in more complex regions of the images.

However, I have a concern that increasing these values too much might lead to a trade-off where the model focuses too heavily on local features at the cost of global geometric consistency across the image. I would like to know your thoughts on this hypothesis: Could increasing the network size cause a decrease in global geometric coherence?

Additionally, for processing images with a resolution of 1920x1080, I aim to achieve a dense depth map without geometric inconsistencies. Could you recommend which of the four pretrained weights would be best suited for this task? And, based on your experience, what would be an optimal setting for net_h and net_w to balance detail and global consistency?

Thanks again for your help and for providing this fantastic tool!

kristoftunner commented 3 weeks ago

@JunhyeongDoyle did you get an answer to the resolution part of your question? How do you create a depth image with 16:9 resolution input without degrading the image quality?

JunhyeongDoyle commented 3 weeks ago

@kristoftunner Hi, thanks for reaching out. In conclusion, I haven't found an optimal method yet. When I kept the network size the same and used higher-resolution images with a 16:9 aspect ratio, the network struggled to accurately extract depth information, especially in high-frequency detail areas. Conversely, when I increased the network size to handle the higher resolution, the network seemed to capture the detailed areas better visually, but I felt that the validity or accuracy of the depth measurements decreased.

kristoftunner commented 3 weeks ago

thanks for the answer!