isl-org / Open3D

Open3D: A Modern Library for 3D Data Processing
http://www.open3d.org
Other
11.37k stars 2.29k forks source link

Unsupported input data type combination while trying the tensor reconstruction system #4655

Closed salmanmaq closed 2 years ago

salmanmaq commented 2 years ago

Checklist

My Question

I have been using the legacy RGBD integration for getting a triangle mesh. Today, I tried to run the Tensor version of that as follows:

.
.
.
    device = o3c.Device('CUDA:0')

    intrinsics = o3c.Tensor(intrinsics.intrinsic_matrix, dtype=o3c.Dtype.Float64)

    voxel_block_grid = o3d.t.geometry.VoxelBlockGrid(
        attr_names=('tsdf', 'weight', 'color'),
        attr_dtypes=(o3c.float32, o3c.float32, o3c.float32),
        attr_channels=((1), (1), (3)),
        voxel_size=config.RECONSTRUCTION_VOXEL_SIZE,
        block_resolution=16,
        block_count=50000,
        device=device
    )

    video = vreader(rgb_video_path)

    for frame_num, (pose, rgb_frame) in enumerate(zip(poses, video)):
        depth_image_path = os.path.join(depth_images_directory, f'{frame_num:06}.npy')
        depth_confidence_map_path = os.path.join(depth_confidence_maps_directory, f'{frame_num:06}.npy')
        depth_image = load_depth_image(depth_image_path, depth_confidence_map_path)
        depth_image = depth_image.to(device)

        extrinsic = np.linalg.inv(pose)
        extrinsic = o3c.Tensor(extrinsic, dtype=o3c.Dtype.Float64)

        frustum_block_coordinates = voxel_block_grid.compute_unique_block_coordinates(
            depth=depth_image, intrinsic=intrinsics, 
            extrinsic=extrinsic, depth_scale=1.0,
            depth_max=5.0
        )

        rgb_image = resize_rgb_image(rgb_frame)
        rgb_image = rgb_image.to(device)

        voxel_block_grid.integrate(
            block_coords=frustum_block_coordinates, depth=depth_image, 
            color=rgb_image, intrinsic=intrinsics, extrinsic=extrinsic,
            depth_scale=1.0, depth_max=5.0
        )

    mesh = voxel_block_grid.extract_triangle_mesh()
    mesh = mesh.to_legacy()
    mesh.compute_vertex_normals()

and the error that I get at the voxel_block_grid.integrate step is:

voxel_block_grid.integrate(
RuntimeError: [Open3D Error] (open3d::t::geometry::kernel::voxel_grid::Integrate(const open3d::core::Tensor&, const open3d::core::Tensor&, const open3d::core::Tensor&, const open3d::core::Tensor&, open3d::t::geometry::TensorMap&, const open3d::core::Tensor&, const open3d::core::Tensor&, open3d::t::geometry::kernel::voxel_grid::index_t, float, float, float, float)::<lambda()>) /root/Open3D/cpp/open3d/t/geometry/kernel/VoxelBlockGrid.cpp:186: Unsupported input data type combination.

Not sure what the problem could be, but the data types for me are:

  1. depth_image: o3d.t.geometry.Image
  2. rgb_image: o3d.t.geometry.Image
  3. intrinsic: o3c.Tensor
  4. extrinsic: o3c.Tensor

I have Open3D 0.14.1 (also tried the latest development version). I also do not get any error at the voxel_block_grid.compute_unique_block_coordinates step. Any help would be appreciated. Thanks for the great work!

salmanmaq commented 2 years ago

Some update: I came across the documentation for VoxelBlockGrid and tried to print out the dtype of rgb_image and depth_image. Turns out, both of them are UInt8. I am not sure why this should be the case. I will see how to debug this. Meanwhile, any guidance would be great.

theNded commented 2 years ago

The Open3D part looks natural, could you please check original data, then the data loading part?

salmanmaq commented 2 years ago

Thank you for the quick reply. I managed to solve the problem. The issue was indeed with how the data was loaded. Previously, I was doing something like this:

  1. The depth maps are stored as npy arrays (in mm). In the code, I change those to meters by dividing values by 1000. Convert to np contiguous array. Cast to o3d.t.geometry.Image.
  2. For the RGB images, the loading process is to load get them one by one from a video (as np arrays) -> Convert to PIL image -> Resize -> Convert back to np array (resulting values 0-255) -> Cast to o3d.t.geometry.Image

Both of these were somehow interpreted as UInt8 which I realized is not a valid input combination for VoxeBlockGrid as in its documentation. The interpretation was a bit strange as I explicitly load the depth map as float32. Anyway, I modified the data loading process as:

  1. Depth maps (npy) -> Divide by 1000 -> o3d.t.geometry.Image(np.ascontiguousarray(depth_image, dtype=np.float32)) | Explicitly specify the np.float32 datatype when converting to contiguous array.
  2. RGB images from video -> Conver to PIL -> Resize -> Convert back to np array -> Divide by 255 to get float representation -> o3d.t.geometry.Image(np.ascontiguousarray(rgb_image, dtype=np.float32)) | Again explicitly specify the dtype

Thank you so much for the response.