isl-org / ZoeDepth

Metric depth estimation from a single image
MIT License
2.22k stars 207 forks source link

About the datasets for metric depth estimation #44

Open weiyithu opened 1 year ago

weiyithu commented 1 year ago

Thanks for your great work!

I am curious that why you use 10 datasets for relative depth training and only 2 datasets for metric depth training? Have you tried using more datasets for metric depth training? I wonder if you can use all 12 datasets for metric depth estimation. Looking forward to your responce!

kwea123 commented 1 year ago

relative depth training is done by midas team, he just uses the pretrained weight. Those datasets don't have metric gt.

northagain commented 1 year ago

In report,MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation,5.Applications says that ZoeD-M12- NK, incorporates a MiDaS v3.1 architecture with the BEiTL encoder with a newly-proposed metric depth binning module that is appended to the decoder. Training combines relative depth training for the MiDaS architecture on the 5+12 dataset mix as described in Sec. 3.3, followed by metric depth fine-tuning for the prediction heads in the bins module. Extensive results verify that ZoeDepth models benefit from relative depth training via MiDaS v3.1, enabling finetuning on two metric depth datasets at once (NYU Depth v2 and KITTI) as well as achieving unprecedented zero-shot generalization performance to adiverse set of unseen metric depth datasets.