isl-org / ZoeDepth

Metric depth estimation from a single image
MIT License
2.27k stars 210 forks source link

Inaccurate depth estimation beyond 40m #45

Open mwdotzom opened 1 year ago

mwdotzom commented 1 year ago

Hello @thias15 @shariqfarooq123 , thank you for the great work!

I met two problems when inferring the models in outdoor car scenes:

  1. According to your description, model_zoe_k should be the one to choose here. However, model_zoe_n and model_zoe_k gave results of around 1m ~ 7m, only model_zoe_nk gave 7m ~ 65m, while gt is 1m ~ 80m. The latter is barely satisfactory for car instances within 10 ~ 40m(<2m error), however at close and far ranges the results seem remote from reality, for example the car front of the camera itself at 8.5m, and a distant car at gt = 65.8m with pred = 44.6m. The original RGB image can be downloaded here. image.png Something also worth mentioning is that the sky have pred results almost the same as the ground, which could be observed easily in the picture. This only happens in model_zoe_nk with mode="eval". Do you have any insights on how to improve the metric predictions at close and far distances? Would further training on datasets work? (yet what way could be beneficial given that it's already trained on 12 datasets...)

  2. As described similarly in issue #28, I tried both default mode ('infer') and mode="eval", but got same results. Could you provide a detailed example of the correct way to do it with torch.hub.load()?

Thank you for your time! :D

Ghul-huan commented 1 year ago

yeah ,i meet the same problem

philippwulff commented 8 months ago

model_zoe_k should be the one to choose here. However, model_zoe_n and model_zoe_k gave results of around 1m ~ 7m, only model_zoe_nk gave 7m ~ 65m

Yeah, I am also seeing this...

toannguyen1904 commented 6 months ago

any solution to tackle this, I meet the same problem :((