Wrong depth scale when using ground-truth camera poses.

nianticlabs / manydepth

[CVPR 2021] Self-supervised depth estimation from short sequences

Other

621 stars 84 forks source link

Wrong depth scale when using ground-truth camera poses. #22

Open ootts opened 3 years ago

ootts commented 3 years ago

Hi, I have a question about using ground-truth camera poses instead of predicted camera poses. I tried to use camera poses with the correct scale in the KITTI dataset, but I find the scale not correct yet. Is there anything I missed? I only changed the code as follows.

output, lowest_cost, costvol = encoder(input_color, lookup_frames,
                                                       relative_poses, # change to relative_poses_gt
                                                       K,
                                                       invK,
                                                       min_depth_bin, max_depth_bin)

Thanks a lot!

JamieWatson683 commented 3 years ago

Hi - thanks for your interest in the project!

Right yes, so the problem with this is that the depth network will be in the same scale as the pose network - some unknown, arbitrary scale.

I'm trying to think of a way to use gt pose to scale the depth estimates, but it isn't immediately obvious.

One way you could do it, would be to ask the depth + pose networks to make predictions as normal, and afterwards scale your depths by the ratio of the predicted translation and the ground truth translation. I can't guarantee that this will give a good result however, but I'd be interested to hear how you get on.

@mdfirman any thoughts?

biggiantpigeon commented 3 years ago

I tried to use gt_pose and abandon the posenet in monodepth, and the output scale is almost correct(about 0.9*gt_depth), so I assume this will work too for manydepth? What I wonder is how can I finetune with a pretrained model, whose scale is arbitrary, to get the real-world-scale result. In monodepth I scale the groundtruth to the pretrained scale, and scale back when predict. I wonder if there's a better way to do this.

ZhanyuGuo commented 2 years ago

@biggiantpigeon Hi. Have you tried using gt_pose in manydepth? I wonder whether it is feasible? Thank you!