nianticlabs / monodepth2

[ICCV 2019] Monocular depth estimation from a single image
Other
4.05k stars 949 forks source link

point cloud #459

Closed Ronald-Kray closed 1 year ago

Ronald-Kray commented 1 year ago

@Jayden9912 @10Exahertz @mrharicot @daniyar-niantic Hi, I'm making a point cloud from my own images. But, the point cloud is weird.

scale = 0 w, h = 640, 192

        K = np.array([[0.58, 0, 0.5, 0],
                      [0, 1.92, 0.5, 0],
                      [0, 0, 1, 0],
                      [0, 0, 0, 1]], dtype=np.float32)

        K[0, :] *= w // (2 ** scale)  # Kitti
        K[1, :] *= h // (2 ** scale)  # Kitti
        inv_K = np.linalg.pinv(K)
        inv_K = np.expand_dims(inv_K, 0)
        inv_K = torch.from_numpy(inv_K).to(device)

        backproject_depth = BackprojectDepth(1, h, w)
        backproject_depth.to(device)
        with torch.no_grad():
            for idx, image_path in enumerate(paths):
                if image_path.endswith("_disp.jpg"):
                    # don't try to predict disparity for a disparity image!
                    continue

                # Load image and preprocess
                input_image = pil.open(image_path).convert('RGB')
                original_width, original_height = input_image.size
                input_image = input_image.resize((feed_width, feed_height), pil.LANCZOS)
                input_image = transforms.ToTensor()(input_image).unsqueeze(0)

                # PREDICTION
                input_image = input_image.to(device)
                features = encoder(input_image)
                outputs = depth_decoder(features)

                disp = outputs[("disp", 0)]
                disp_resized = torch.nn.functional.interpolate(
                    disp, (original_height, original_width), mode="bilinear", align_corners=False)

                scaled_disp, depth = disp_to_depth(disp, 0.1, 100)

                cam_points = backproject_depth(depth, inv_K)
                cam_points = cam_points.cpu().numpy()[0].T[:, :3].astype(np.float32)
                cam_points[:, 2] * 30

                pcd = o3d.geometry.PointCloud()
                pcd.points = o3d.utility.Vector3dVector(cam_points)
                o3d.visualization.draw_geometries([pcd])  # Visualize the point cloud

                disp_resized_np = disp_resized.squeeze().cpu().numpy()
                vmax = np.percentile(disp_resized_np, 95)
                normalizer = mpl.colors.Normalize(vmin=disp_resized_np.min(), vmax=vmax)
                mapper = cm.ScalarMappable(norm=normalizer, cmap='magma')
                colormapped_im = (mapper.to_rgba(disp_resized_np)[:, :, :3] * 255).astype(np.uint8)
                plt.imshow(colormapped_im)

google_ google__disp Capture

daniyar-niantic commented 1 year ago

Hi @Ronald-Kray

Please try your code on a KITTI image using resolution that was used for training of the network.

The depthmap that you provided have does not look accurate, most likely the network did not generalize to this image.