isl-org / DPT

Dense Prediction Transformers
MIT License
2.01k stars 258 forks source link

Reconcile depth maps from VT and usual midas model #19

Closed RuslanOm closed 3 years ago

RuslanOm commented 3 years ago

Hi! Thanks for a great job. I'm trying to reconcile output from usual midas model and vt model, but have some problems. I need this for open3d visualization: usually a take inverse midas output and get normal 3d point clouds, but for vt this pipeline breakes.

Can you explain, please, how can i fix this? Thanks!

ranftlr commented 3 years ago

Not sure if I understand your question correctly. Could you clarify what exactly breaks? DPT uses the same output representation as MiDaS, inverse depth up to unkown scale and shift. So, in principle, the same visualization approach should work.

RuslanOm commented 3 years ago

So, for example, I tested models on this image. I also masked zeros in both depth_maps before inverting and get this pictures. Fist is a result of usual midas, second - dpt model result.

Снимок экрана 2021-05-08 в 13 21 42 Снимок экрана 2021-05-08 в 13 26 13

segm_example

I'm trying to understand the reason of problem, but I can't.

ranftlr commented 3 years ago

Without seeing your code I can only guess: This is likely caused by far-away points that "stretch" the point cloud and has nothing to do with DPT. Try zooming in or removing points beyond a certain range.

I just verified that it works fine with this code:

import open3d as o3d
import numpy as np
import glob

from util.io import read_pfm

intrinsic = o3d.io.read_pinhole_camera_intrinsic("intrinsics.json")

c_imgs = glob.glob("./input/*.jpg")
c_imgs.sort()

d_imgs = glob.glob("./output_monodepth/*.pfm")
d_imgs.sort()

for idx in range(len(c_imgs)):
    color = o3d.io.read_image(c_imgs[idx])
    idepth = read_pfm(d_imgs[idx])[0]

    focal = intrinsic.intrinsic_matrix[0, 0]

    depth = focal / (idepth)

    # Clip far away points
    depth[depth >= 50] = np.inf

    depth = o3d.geometry.Image(depth)

    rgbdi = o3d.geometry.RGBDImage.create_from_color_and_depth(color, depth)
    pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbdi, intrinsic)
    o3d.visualization.draw_geometries([pcd])

with these intrinsics

{
    "width" : 640,
    "height" : 480,
    "intrinsic_matrix" : 
    [
        250.0,
        0.0,
        0.0,
        0.0,
        250.0,
        0.0,
        320.0,
        240.0,
        1.0
    ]
}

Looks like this with DPT-Large:

image
RuslanOm commented 3 years ago

So, I had the same idea about this problem, thanks!

One more question: this threshold (50) - it is for all images for dpt or should be specified for every image? First idea it is to use quantiles for cutting depth

ranftlr commented 3 years ago

You probably need to adapt this threshold per image as the output magnitudes depend on image shape and content. Quantiles should work. Another aproach that I've found reasonably robust ist to normalize and then clip based on a fixed factor of the chosen focal length. Something like this:

idepth = idepth - np.amin(idepth)
idepth /= np.amax(idepth)

focal = intrinsic.intrinsic_matrix[0, 0]
depth = focal / idepth
depth[depth >= threshold * focal] = np.inf
RuslanOm commented 3 years ago

Ok, I'll try this! Thanks a lot!