shariqfarooq123 / AdaBins

Official implementation of Adabins: Depth Estimation using adaptive bins
GNU General Public License v3.0
725 stars 156 forks source link

The result of KITTI Eigen split #58

Closed guogangok closed 2 years ago

guogangok commented 2 years ago

We tested the predicted depths in 16-bit format for KITTI Eigen split: https://drive.google.com/drive/folders/1b3nfm8lqrvUjtYGmsqA5gptNQ8vPlzzS?usp=sharing. The result is : Scaling ratios | med: 1.104 | std: 0.058

abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.109 & 0.765 & 4.860 & 0.198 & 0.885 & 0.957 & 0.978 \ Is there anything I have missed? Thank you!

shariqfarooq123 commented 2 years ago

That's weird. Could you please confirm you're using the official annotated depth maps downloaded from here?

guogangok commented 2 years ago

That's weird. Could you please confirm you're using the official annotated depth maps downloaded from here?

Thanks for your reply! We used the script provided by Monodepth2 to get the gt depth. (https://github.com/nianticlabs/monodepth2/blob/master/export_gt_depth.py.) We have tested many methods in this way. It seems the gt depth is OK.

shariqfarooq123 commented 2 years ago

You need to use accurate official annotated depth maps (that are used by our competitors at the time of publication, check, for example, BTS for fair comparison).

It looks like you're using the RAW projected LiDAR points to get the GT. This has been shown to be very noisy (unless you do a lot of checking and filtering, which the script you provided doesn't seem to include) and sometimes leads to better numbers to methods that are worse in reality and vice-versa. See, for example, the following figure from this paper:

image

guogangok commented 2 years ago

@shariqfarooq123 Thanks for your advice. We found the cause of the problem because the predicted and GT depth maps were aligned differently. Monodepth2 uses cv. resize to align the predicted with the GT, and this causes a big difference. We then got the following results:

Mono evaluation - using median scaling Scaling ratios | med: 1.011 | std: 0.046 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.058 & 0.217 & 2.617 & 0.090 & 0.963 & 0.995 & 0.999 \

It is not as good as the results in your paper, but it is acceptable.

And we also tested BTS, the results are: Mono evaluation - using median scaling Scaling ratios | med: 1.007 | std: 0.047 abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 | & 0.060 & 0.250 & 2.795 & 0.095 & 0.959 & 0.993 & 0.998 \

The different ways to evaluate the SOTA methods accurately are confusing, and we need a uniform evaluation code.

Thank you again!

shariqfarooq123 commented 2 years ago

@guogangok I agree. MDE community could really use better and stronger standards for evaluation.

I'm glad the issue is resolved.