Closed RuslanOm closed 3 years ago
Not sure if I understand your question correctly. Could you clarify what exactly breaks? DPT uses the same output representation as MiDaS, inverse depth up to unkown scale and shift. So, in principle, the same visualization approach should work.
So, for example, I tested models on this image. I also masked zeros in both depth_maps before inverting and get this pictures. Fist is a result of usual midas, second - dpt model result.
I'm trying to understand the reason of problem, but I can't.
Without seeing your code I can only guess: This is likely caused by far-away points that "stretch" the point cloud and has nothing to do with DPT. Try zooming in or removing points beyond a certain range.
I just verified that it works fine with this code:
import open3d as o3d
import numpy as np
import glob
from util.io import read_pfm
intrinsic = o3d.io.read_pinhole_camera_intrinsic("intrinsics.json")
c_imgs = glob.glob("./input/*.jpg")
c_imgs.sort()
d_imgs = glob.glob("./output_monodepth/*.pfm")
d_imgs.sort()
for idx in range(len(c_imgs)):
color = o3d.io.read_image(c_imgs[idx])
idepth = read_pfm(d_imgs[idx])[0]
focal = intrinsic.intrinsic_matrix[0, 0]
depth = focal / (idepth)
# Clip far away points
depth[depth >= 50] = np.inf
depth = o3d.geometry.Image(depth)
rgbdi = o3d.geometry.RGBDImage.create_from_color_and_depth(color, depth)
pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbdi, intrinsic)
o3d.visualization.draw_geometries([pcd])
with these intrinsics
{
"width" : 640,
"height" : 480,
"intrinsic_matrix" :
[
250.0,
0.0,
0.0,
0.0,
250.0,
0.0,
320.0,
240.0,
1.0
]
}
Looks like this with DPT-Large:
So, I had the same idea about this problem, thanks!
One more question: this threshold (50) - it is for all images for dpt or should be specified for every image? First idea it is to use quantiles for cutting depth
You probably need to adapt this threshold per image as the output magnitudes depend on image shape and content. Quantiles should work. Another aproach that I've found reasonably robust ist to normalize and then clip based on a fixed factor of the chosen focal length. Something like this:
idepth = idepth - np.amin(idepth)
idepth /= np.amax(idepth)
focal = intrinsic.intrinsic_matrix[0, 0]
depth = focal / idepth
depth[depth >= threshold * focal] = np.inf
Ok, I'll try this! Thanks a lot!
Hi! Thanks for a great job. I'm trying to reconcile output from usual midas model and vt model, but have some problems. I need this for open3d visualization: usually a take inverse midas output and get normal 3d point clouds, but for vt this pipeline breakes.
Can you explain, please, how can i fix this? Thanks!