isl-org / Open3D

Open3D: A Modern Library for 3D Data Processing
http://www.open3d.org
Other
11.47k stars 2.31k forks source link

Tensor reconstruction system supports only Float64 extrinsics #4832

Open ghost opened 2 years ago

ghost commented 2 years ago

Checklist

My Question

Hi, I'm using the tensor reconstruction system and I've noticed the transformation from the poses are not passed in correctly. After some digging, I've noticed the reconstruction system accepts only Float64 extrinsic values:

// Open3D/cpp/open3d/t/geometry/Utility.h

if (extrinsic.GetDtype() != core::Dtype::Float64) {
        utility::LogError("Unsupported extrinsic matrix dtype {}",
                          extrinsic.GetDtype().ToString());
    }

My transformations are in Float64, but with this dtype, all of the values "collapse" into a single point and the camera just rotates in one place (since only the rotation values in the extrinsic are correct).

If I create a PointCloud with the transformation values passed in as the positions casted into Float32, I can see the pose graph just fine, with Float64 I only see one point. As mentioned, I cannot pass Float32 into the reconstruction pipeline, namely when calling: frustum_block_coords = vbg.compute_unique_block_coordinates(depth, intrinsic, extrinsic) (even with the dtype Float64 specified)

Any solution/workaround for this?

yxlao commented 2 years ago

Are you using Python? Here's a potential quick fix:

frustum_block_coords = vbg.compute_unique_block_coordinates(depth, intrinsic, extrinsic.to(o3d.core.Float64))

Similarly, if extrinsic is a numpy type, you can do extrinsic.astype(np.float64).

This can be improved within the C++ code to handle both Float32 and Float64 @theNded .

ghost commented 2 years ago

Thanks for the reply, yes I'm using Python.

As mentioned, I've tried casting between the two dtypes several ways (including the ones you mentioned), however casting it to Float64 breaks it, leaving me with a single point in the middle of the scene. Float32 works fine and can be used when creating an individual point cloud, but it cannot be used in the pipeline due to the Float64 requirement for the extrinsic.

I've also quickly tried building Open3D from source and just deleting the assertions for Float64 (or adding a support for Float32) in Utility.h, but it seems like it's used further down the pipeline, so it would require a bit more work to implement this.

EDIT: When reading the poses from the file, they are in Float64 by default. I need to cast them to Float32 to make it work with the PointCloud (which I can't do with the pipeline). Creating tensors of the poses (only the translation values/coordinates) with both dtypes returns the same values (but only Float64 works): image

(Same result with o3d.core.float64/32)

EDIT 2: Since I can pass Float32 into an individual point cloud, here's what it looks like when reading the poses (which are in Float64): unknown --> i.e. just a single point

and after casting it to Float32: unknown (1) --> i.e. correct trajectory

Thus, I need to cast the Float64 poses into Float32 when I want to visualise them in a point cloud. I wanted to do the same trick with the reconstruction pipeline, but as mentioned, the reconstruction pipeline only accepts Float64.

theNded commented 2 years ago

The setups are very confusing. Specifically, I don't understand:

  1. What exactly are your point clouds? Are they really scanned point clouds or simply poses? You only showed poses and said "a single point" is wrong and "a trajectory" is correct so I'm assuming you are talking about poses only.
  2. How do you visualize the trajectory? It doesn't make sense if they have the same value but are shown at different positions.

My understanding is that there are two problems. One with the pose conversion, and one with the tensor reconstruction. I assume something goes wrong with the pose visualization that leads you to believe the conversions are wrong. Please elaborate your setup step by step, better show commented code snippets.

ghost commented 2 years ago

Sorry for the confusion, let me try again:

intrinsic = load_intrinsic() # Using the Open3D json config gile intrinsic = o3c.Tensor(intrinsic, dtype=o3c.Dtype.Float64)

extrinsic = o3c.Tensor(np.linalg.inv(poses[i]), dtype=o3c.Dtype.Float64)

frustum_block_coords = vbg.compute_unique_block_coordinates(depth, intrinsic, extrinsic, config['depth_scale'], config['depth_max'])

color = o3d.t.io.read_image(imgs_left_color[i]).to(device) vbg.integrate(frustum_block_coords, depth, color, intrinsic, extrinsic, config['depth_scale'], config['depth_max'])


- To understand what's going on, I wanted to visualise only the camera trajectory/poses to see whether it is really the case that the camera is "not moving" - I've extracted only the **translation values** from the poses and created a _separate_ point cloud to visualise the trajectory (images in previous post):
```Python
pcd = o3d.t.geometry.PointCloud(device)
pcd.point["positions"] = o3c.Tensor(np.array(poses)[:20, :3, 3], dtype=o3c.Dtype.Float64, device=device) 
# Takes first 20 poses, first three rows, third column i.e. the translation values only
# Here I sticked with Float64

Just to filter out possible suggestions:

Hope this helps @theNded

theNded commented 2 years ago

Thanks for the clarification. Let's figure out the pose issue first, as I cannot reproduce the issue with my own pose files also loaded into numpy. Could you please provide the pose file and the snippet to load the poses?

ghost commented 2 years ago

No problem The pose file is in the 3x4 matrix format; it's for the KITTI odometry dataset (sequence 7) and the file is produced by ORB-SLAM2 (built-in function for KITTI):

CameraTrajectory.txt

Here is the function for reading the poses:

def load_poses():
    poses = []
    f = open("../lib/orb-slam2/CameraTrajectory.txt", "r")
    lines = f.readlines()
    for line in lines:
        pose = np.fromstring(line, sep=' ').reshape(-1, 4)
        pose = np.vstack((pose, [0.0, 0.0, 0.0, 1.0]))
        poses.append(pose)

    return np.array(poses) # Changed to numpy for easier indexing, the issue was the same with just a list

Here is again the code for creating the point cloud I'm using:

poses = load_poses()

device = o3c.Device("CPU:0")
pcd = o3d.t.geometry.PointCloud(device)
pcd.point["positions"] = o3c.Tensor(np.array(poses)[:20, :3, 3], dtype=o3c.Dtype.Float64, device=device) # First 20 poses

o3d.t.io.write_point_cloud("./point_clouds/pointcloud.pcd", pcd) 
# Changing the dtype in either the numpy array or in the tensor above produces the same result
# The trajectory is shown correctly with Float32
ghost commented 2 years ago

@theNded Sorry for the ping/bump, I'm sure you'd let me know in this thread if there was any update regarding this issue, I just want to make sure as this is a somewhat critical part of my project I have a strict deadline for, and I would have to look for a different solution. With that being said, I'll try to look into the issue directly myself and potentially submit a pull request, but right now, I have to focus on other aspects of the project. Do you think I can expect a fix/further information about this anytime soon?

theNded commented 2 years ago

Sorry for the delay, I have been very busy (I also have strict dues), and this issue somehow fell out of my inbox...

I tried to load your trajectory and all the snippets below produce reasonable visualization:

pcd = o3d.t.geometry.PointCloud(o3c.Tensor(poses[:20, :3, 3])) or pcd = o3d.t.geometry.PointCloud(o3c.Tensor(poses[:20, :3, 3], dtype=o3c.Dtype.Float64)) or

pcd = o3d.t.geometry.PointCloud()
pcd.point['positions'] = o3c.Tensor(poses[:20, :3, 3], dtype=o3c.Dtype.Float64)

or

pcd = o3d.t.geometry.PointCloud()
pcd.point['positions'] = o3c.Tensor(np.array(poses)[:20, :3, 3], dtype=o3c.Dtype.Float64)

image

Full trajectory also makes sense: image

Tested on 0.14.1 and 0.15.2.

ghost commented 2 years ago

Thanks for getting back to me and testing the poses. After I tried all the alternatives you've mentioned without success, I've noticed I was using the legacy system for reading the point cloud before visualising it, i.e. o3d.io.read_point_cloud(...) instead of o3d.t.io.read_point_cloud(...)

With the tensor one, the visualisation looks correct when using Float64. I assume these conflicts between the legacy and tensor systems are expected to occur, still it might be worth implementing a check/warning message for such cases?

Thanks for the help!

theNded commented 2 years ago

This should not happen in theory, @reyanshsolis are there compatibility issues when writing double to a pcd with t.io and reading it with io?