noahzn / Lite-Mono

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
MIT License
540 stars 61 forks source link

How to get depth of real-world scale from deopth_decoder disparity prediction on other dataset such as NuScenes #61

Closed xuqinwang closed 1 year ago

xuqinwang commented 1 year ago

Hi, thank you for your excellent work! I was trying to use your pretrained weight to inference depth on Nuscene camera image, following your test_simple.py script. I got the depth using

outputs = depth_decoder(features)
disp = outputs[("disp", 0)]
disp_resized = torch.nn.functional.interpolate(
      disp, (sample['original_height'][batch], sample['original_width'][batch]), mode="bilinear", align_corners=False)
scaled_disp, depth = disp_to_depth(disp_resized, 0.1, 100) 

The disparity image looks well, however the depth seems to be the wrong scale, s.t. the pseudo point cloud i got from depth is within little real-world range. Just would like to ask is there way to correct this using camera poses of consequence image frames, without training network on new dataset? Many thanks!

noahzn commented 1 year ago

Hi, thanks for your interest in our work.

The disp_to_depth function scales the depth predictions to the range [0.1, 100], and the same range was used for training. The predicted values are not metric depth, but as the MonoDepth2 paper said, we can get a scale factor when evaluating on the KITTI dataset. Please see the code here. You can try to get a scale factor from Nuscene's ground-truth by median scaling, and then multiply it by each value of your prediction.

This may be very inaccurate, but it's worth a try.

noahzn commented 1 year ago

I am now closing this thread due to lack of response.