lpiccinelli-eth / UniDepth

Universal Monocular Metric Depth Estimation
Other
586 stars 47 forks source link

project_points() usage #37

Open David-Yan1 opened 5 months ago

David-Yan1 commented 5 months ago

Hi! I was trying to use project_points() to get back a depth map from a modified point cloud. I noticed that there was a black grid pattern on the outputted depth map - not sure if I did something wrong or if it's intended. Below is a depth prediction (predictions["depth"]) for an image, and then the project_points() depth map using predictions["points"].

image image
lpiccinelli-eth commented 5 months ago

Are you resizing the pointmap? That black grid is due to missing points falling in those regions and it could be due to some nearest interpolation

David-Yan1 commented 5 months ago

Ah, I believe I was using the incorrect intrinsics in the original. However, the depth is still noisy. Here is a minimal script to reproduce

image image
import open3d as o3d
from PIL import Image
import numpy as np
import torch
from unidepth.utils import colorize
from unidepth.models import UniDepthV1
from unidepth.utils.visualization import save_file_ply
from unidepth.utils.geometric import project_points

model = UniDepthV1.from_pretrained("lpiccinelli/unidepth-v1-vitl14")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

rgb = np.array(Image.open('assets/demo/rgb.png'))
rgb_torch = torch.from_numpy(rgb).permute(2, 0, 1)
predictions = model.infer(rgb_torch)
depth1 = predictions["depth"].squeeze().cpu().numpy()

depth_pred_col = colorize(depth1, vmin=0.01, vmax=10.0, cmap="magma_r")
Image.fromarray(depth_pred_col).save("original_depth.png")

H, W = rgb.shape[:2]
torch_intrinsic1 = predictions["intrinsics"] 
pcd =  predictions["points"].view(1, 3, -1)  # Shape: (B, 3, H,W) -> (B, 3, HW)
pcd = pcd.transpose(1, 2)  # Shape: (B, HW, 3)

depth = project_points(pcd, torch_intrinsic1, (H,W))
depth_pred_col = colorize(depth.squeeze().cpu().numpy(), vmin=0.01, vmax=10.0, cmap="magma_r")
Image.fromarray(depth_pred_col).save("projected_depth.png")
lpiccinelli-eth commented 5 months ago

Thank you for diving deeper! We have never tried this sanity check :sweat_smile: . There are different possible explanations: 1) The project points may have some problem: some pixels are re-projected on other pixels, thus creating that pepper noise, it may be something related to rounding. 2) The generate_rays is slightly shifting points (hence like rounding effect) 3) spherical_zbuffer_to_euclidean presents some numerical errors, we will try with float64. 4) The interpolation used for depth creates that inconsistency. However, the quasi-random nature of that noise makes me propend for the first/second option.

We will investigate better and try to solve, or at least explain, the source of the problem.

David-Yan1 commented 5 months ago

Figured it out! I was rewriting my own projection script and the noise appeared when i used round() instead of using int(). So I believe

# To pixels (rounding!!!), no int as it breaks gradient
    points_2d = points_2d.round()

should be

 # To pixels (rounding!!!), no int as it breaks gradient
 points_2d = points_2d.int()

This fixes the noise as seen below (the comment seems to be wrong).

image image
lpiccinelli-eth commented 5 months ago

Thank you for your comment, your suggestion has been included in the PR #38 which includes V2 release, too