jytime / Deep-SfM-Revisited

[CVPR 2021] Deep Two-View Structure-from-Motion Revisited
MIT License
182 stars 13 forks source link

Abouth using median value to solve scale ambiguity when evualating #20

Open xhchen10 opened 1 year ago

xhchen10 commented 1 year ago

Hi, thanks for sharing the impressive work.

According to the codes at Line 576-585 in main.py, you use the ratio between the median values of predicted and GT depth to scale the predicted depth. However, the predicted depth has been scaled by the GT scale \alpha_gt (see Line 536-541 in main.py). Hence, I am confused about why the rescaling operation by the ratio of median values is necessary (the performance would drop significantly without it).

Could you kindly help me resolve the confusion. Thank you so much.

LeoPerelli commented 1 year ago

Hey @xhchen10 , did you have any luck finding out the reason? I spotted that too and seems strange

xhchen10 commented 1 year ago

Nope. :(

jytime commented 1 year ago

Hi @xhchen10 @LeoPerelli ,

The role of "GT scale \alpha_gt (see Line 536-541 in main.py)" is to ensure that the depth values are properly normalized during the training phase, which helps with the stability of the training process. This is for the stability of training. As mentioned, such an operation can be skipped during inference.

https://github.com/jytime/Deep-SfM-Revisited/blob/ea8158d163219607be361a4777e9362f5ee6ec43/main.py#L536-L541

At the same time, "use the ratio between the median values of predicted and GT depth to scale the predicted depth (576-585)" is the common evaluation protocol for depth evaluation. This is due to the well known scale ambiguity problem. We simply adopted the same evaluation pipeline used in previous methods. For example, https://github.com/nianticlabs/monodepth2/blob/b676244e5a1ca55564eb5d16ab521a48f823af31/evaluate_depth.py#L206 in monodepth2.

Please let me know if you have further questions :)

Best, Jianyuan

LeoPerelli commented 1 year ago

Thanks a lot Jianyuan, now it makes sense! :)