Tensor reconstruction system supports only Float64 extrinsics

ghost commented 2 years ago

Checklist

[X] I have searched for similar issues.
[X] For Python issues, I have tested with the latest development wheel.
[X] I have checked the release documentation and the latest documentation (for master branch).

My Question

Hi, I'm using the tensor reconstruction system and I've noticed the transformation from the poses are not passed in correctly. After some digging, I've noticed the reconstruction system accepts only Float64 extrinsic values:

// Open3D/cpp/open3d/t/geometry/Utility.h

if (extrinsic.GetDtype() != core::Dtype::Float64) {
        utility::LogError("Unsupported extrinsic matrix dtype {}",
                          extrinsic.GetDtype().ToString());
    }

My transformations are in Float64, but with this dtype, all of the values "collapse" into a single point and the camera just rotates in one place (since only the rotation values in the extrinsic are correct).

If I create a PointCloud with the transformation values passed in as the positions casted into Float32, I can see the pose graph just fine, with Float64 I only see one point. As mentioned, I cannot pass Float32 into the reconstruction pipeline, namely when calling: frustum_block_coords = vbg.compute_unique_block_coordinates(depth, intrinsic, extrinsic) (even with the dtype Float64 specified)

Any solution/workaround for this?

yxlao commented 2 years ago

Are you using Python? Here's a potential quick fix:

frustum_block_coords = vbg.compute_unique_block_coordinates(depth, intrinsic, extrinsic.to(o3d.core.Float64))

Similarly, if extrinsic is a numpy type, you can do extrinsic.astype(np.float64).

This can be improved within the C++ code to handle both Float32 and Float64 @theNded .

ghost commented 2 years ago

Thanks for the reply, yes I'm using Python.

As mentioned, I've tried casting between the two dtypes several ways (including the ones you mentioned), however casting it to Float64 breaks it, leaving me with a single point in the middle of the scene. Float32 works fine and can be used when creating an individual point cloud, but it cannot be used in the pipeline due to the Float64 requirement for the extrinsic.

I've also quickly tried building Open3D from source and just deleting the assertions for Float64 (or adding a support for Float32) in Utility.h, but it seems like it's used further down the pipeline, so it would require a bit more work to implement this.

EDIT: When reading the poses from the file, they are in Float64 by default. I need to cast them to Float32 to make it work with the PointCloud (which I can't do with the pipeline). Creating tensors of the poses (only the translation values/coordinates) with both dtypes returns the same values (but only Float64 works):

(Same result with o3d.core.float64/32)

EDIT 2: Since I can pass Float32 into an individual point cloud, here's what it looks like when reading the poses (which are in Float64): unknown --> i.e. just a single point

and after casting it to Float32: unknown (1) --> i.e. correct trajectory

Thus, I need to cast the Float64 poses into Float32 when I want to visualise them in a point cloud. I wanted to do the same trick with the reconstruction pipeline, but as mentioned, the reconstruction pipeline only accepts Float64.

theNded commented 2 years ago

The setups are very confusing. Specifically, I don't understand:

What exactly are your point clouds? Are they really scanned point clouds or simply poses? You only showed poses and said "a single point" is wrong and "a trajectory" is correct so I'm assuming you are talking about poses only.
How do you visualize the trajectory? It doesn't make sense if they have the same value but are shown at different positions.

My understanding is that there are two problems. One with the pose conversion, and one with the tensor reconstruction. I assume something goes wrong with the pose visualization that leads you to believe the conversions are wrong. Please elaborate your setup step by step, better show commented code snippets.

ghost commented 2 years ago

Sorry for the confusion, let me try again:

I'm reading the poses from a text file, by default, the dtype is Float64 (just reading a file into a numpy array)

I've noticed an issue with the reconstruction pipeline, where the camera (given by the pose) wasn't "moving", only rotating - this produced overlapping point clouds of the images inside the voxel block grid. Here I followed the documentation, here is a simplified snippet just in case:


img_left = np.array(Image.open(imgs_left[i]))
img_right = np.array(Image.open(imgs_right[i]))
disp, _ = Elas().process(img_left, img_right) # Computer disparity map
depth = ...# ... Some processing of the disparity map, converting into a depth map....
depth = o3d.t.geometry.Image(depth)

intrinsic = load_intrinsic() # Using the Open3D json config gile intrinsic = o3c.Tensor(intrinsic, dtype=o3c.Dtype.Float64)

extrinsic = o3c.Tensor(np.linalg.inv(poses[i]), dtype=o3c.Dtype.Float64)

frustum_block_coords = vbg.compute_unique_block_coordinates(depth, intrinsic, extrinsic, config['depth_scale'], config['depth_max'])

color = o3d.t.io.read_image(imgs_left_color[i]).to(device) vbg.integrate(frustum_block_coords, depth, color, intrinsic, extrinsic, config['depth_scale'], config['depth_max'])


- To understand what's going on, I wanted to visualise only the camera trajectory/poses to see whether it is really the case that the camera is "not moving" - I've extracted only the **translation values** from the poses and created a _separate_ point cloud to visualise the trajectory (images in previous post):
```Python
pcd = o3d.t.geometry.PointCloud(device)
pcd.point["positions"] = o3c.Tensor(np.array(poses)[:20, :3, 3], dtype=o3c.Dtype.Float64, device=device) 
# Takes first 20 poses, first three rows, third column i.e. the translation values only
# Here I sticked with Float64

With the poses in Float64 (default dtype), they "collapse" into a single point, as you can see in the image above. Thus, when using these poses inside the reconstruction pipeline, it is the case that the camera only rotates essentially in one place and the images point clouds overlap.
If I cast the poses into Float32 at any point using any approach (numpy or Open3D directly), and visualize the trajectory again with the snippet above, it's correct, as seen in the image above again (each point is one pose)
--> However, I cannot pass these casted Float32 poses into the pipeline, as it accepts only Float64

Just to filter out possible suggestions:

I have tried any way/combination of casting I could think of, including manually shifting/adjusting the bit strings
When creating tensors of the poses' positions (and point clouds afterwards), they both contain the same values for both dtypes
Multiplying the pose values, so that the potential lost in precision isn't relevant, does not help (+ the lost in precision would be a thing if it was other way around, i.e. points collapsing after casting them to Float32)
I have tried keeping the poses in Float64 and passing them through the reconstruction pipeline, and then for the final visualisation of the images point clouds, I'd cast the positions back into Float32 (here the assumption was only the visualisation suffers from the collapsing issue) - no difference though

Hope this helps @theNded

theNded commented 2 years ago

Thanks for the clarification. Let's figure out the pose issue first, as I cannot reproduce the issue with my own pose files also loaded into numpy. Could you please provide the pose file and the snippet to load the poses?

ghost commented 2 years ago

No problem The pose file is in the 3x4 matrix format; it's for the KITTI odometry dataset (sequence 7) and the file is produced by ORB-SLAM2 (built-in function for KITTI):

CameraTrajectory.txt

Here is the function for reading the poses:

def load_poses():
    poses = []
    f = open("../lib/orb-slam2/CameraTrajectory.txt", "r")
    lines = f.readlines()
    for line in lines:
        pose = np.fromstring(line, sep=' ').reshape(-1, 4)
        pose = np.vstack((pose, [0.0, 0.0, 0.0, 1.0]))
        poses.append(pose)

    return np.array(poses) # Changed to numpy for easier indexing, the issue was the same with just a list

Here is again the code for creating the point cloud I'm using:

poses = load_poses()

device = o3c.Device("CPU:0")
pcd = o3d.t.geometry.PointCloud(device)
pcd.point["positions"] = o3c.Tensor(np.array(poses)[:20, :3, 3], dtype=o3c.Dtype.Float64, device=device) # First 20 poses

o3d.t.io.write_point_cloud("./point_clouds/pointcloud.pcd", pcd) 
# Changing the dtype in either the numpy array or in the tensor above produces the same result
# The trajectory is shown correctly with Float32

ghost commented 2 years ago

@theNded Sorry for the ping/bump, I'm sure you'd let me know in this thread if there was any update regarding this issue, I just want to make sure as this is a somewhat critical part of my project I have a strict deadline for, and I would have to look for a different solution. With that being said, I'll try to look into the issue directly myself and potentially submit a pull request, but right now, I have to focus on other aspects of the project. Do you think I can expect a fix/further information about this anytime soon?

theNded commented 2 years ago

Sorry for the delay, I have been very busy (I also have strict dues), and this issue somehow fell out of my inbox...

I tried to load your trajectory and all the snippets below produce reasonable visualization:

pcd = o3d.t.geometry.PointCloud(o3c.Tensor(poses[:20, :3, 3])) or pcd = o3d.t.geometry.PointCloud(o3c.Tensor(poses[:20, :3, 3], dtype=o3c.Dtype.Float64)) or

pcd = o3d.t.geometry.PointCloud()
pcd.point['positions'] = o3c.Tensor(poses[:20, :3, 3], dtype=o3c.Dtype.Float64)

or

pcd = o3d.t.geometry.PointCloud()
pcd.point['positions'] = o3c.Tensor(np.array(poses)[:20, :3, 3], dtype=o3c.Dtype.Float64)

Full trajectory also makes sense:

Tested on 0.14.1 and 0.15.2.

ghost commented 2 years ago

Thanks for getting back to me and testing the poses. After I tried all the alternatives you've mentioned without success, I've noticed I was using the legacy system for reading the point cloud before visualising it, i.e. o3d.io.read_point_cloud(...) instead of o3d.t.io.read_point_cloud(...)

With the tensor one, the visualisation looks correct when using Float64. I assume these conflicts between the legacy and tensor systems are expected to occur, still it might be worth implementing a check/warning message for such cases?

Thanks for the help!

theNded commented 2 years ago

This should not happen in theory, @reyanshsolis are there compatibility issues when writing double to a pcd with t.io and reading it with io?

isl-org / Open3D

Tensor reconstruction system supports only Float64 extrinsics #4832

Checklist

My Question