Doubts about the generation of the 3D point cloud, intrinsic matrix and alignment between rgb and depth frames.

yorard123 commented 10 months ago

Hi,

I have a few questions about how to reconstruct the 3D point cloud:

Here you mention that the intrinsic parameters are from the RGB camera, however, here, says that the intrinsic parameters are from the depth camera. So, which of both versions is the correct one?
How can I project the depth channel into the RGB image, do we have access to the projection matrix? Are both frames already aligned?
I saw that in the recordings there is a file called confidence, would it be possible to get those confidences scores by usb streaming?

In parallel, without taking into account all the unknowns that I presented earlier. I tried to do a first 3D point cloud reconstruction, with the following code:

import numpy as np
import cv2
import liblzfse 
import json
import open3d as o3d
import pathlib

def load_depth(filepath):
    with open(filepath, 'rb') as depth_fh:
        raw_bytes = depth_fh.read()
        decompressed_bytes = liblzfse.decompress(raw_bytes)
        depth_img = np.frombuffer(decompressed_bytes, dtype=np.float32)
        depth_img = depth_img.reshape((256, 192)) 
    return depth_img

def load_conf(filepath):
    with open(filepath, 'rb') as depth_fh:
        raw_bytes = depth_fh.read()
        decompressed_bytes = liblzfse.decompress(raw_bytes)
        conf = np.frombuffer(decompressed_bytes, dtype=np.int8)
        conf = conf.reshape((256, 192))
    return np.float32(conf)

def create_point_cloud_depth(depth, rgb, fx, fy, cx, cy):
    depth_shape = depth.shape
    [x_d, y_d] = np.meshgrid(range(0, depth_shape[1]), range(0, depth_shape[0]))
    x3 = np.divide(np.multiply((x_d-cx), depth), fx)
    y3 = np.divide(np.multiply((y_d-cy), depth), fy)
    z3 = depth

    coord =  np.stack((x3, y3, z3), axis=2)

    rgb_norm = rgb/255

    return np.concatenate((coord, rgb_norm), axis=2)

if __name__ == '__main__':
    root_path = pathlib.Path('./data/011924')
    depth_filepath = root_path.joinpath('rgbd/0.depth')
    rgb_filepath = root_path.joinpath('rgbd/0.jpg')
    conf_filepath = root_path.joinpath('rgbd/0.conf')

    conf = load_conf(str(conf_filepath))
    depth_img = load_depth(str(depth_filepath))

    rgb = cv2.imread(str(rgb_filepath))
    rgb = cv2.cvtColor(rgb, cv2.COLOR_BGR2RGB)

    with open(root_path.joinpath('metadata'), "rb") as f:
        metadata = json.loads(f.read())

    K = np.asarray(metadata['K'])

    depth_resized = cv2.resize(depth_img, (720, 960))
    conf_resized = cv2.resize(conf, (720, 960), cv2.INTER_NEAREST_EXACT).reshape(-1)

    depth_reshaped = depth_resized.reshape(-1, 1)

    rgb_reshaped = rgb.reshape(-1, 3)/255

    pc = create_point_cloud_depth(depth_resized, rgb, K[0], K[4], K[6], K[7]).reshape(-1, 6)

    value, counts = np.unique(conf_resized, return_counts=True)
    value, counts = np.unique(conf, return_counts=True)

    pc = pc[conf_resized >= 2]
    print(pc.shape)

    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(pc[:, :3])
    pcd.colors = o3d.utility.Vector3dVector(pc[:, 3:])

    o3d.visualization.draw_geometries([pcd])

The PC generated by this code is bigger than the one provided by the app. Am I doing something wrong?

marek-simonik commented 10 months ago

Hi,

let me answer your questions:

The provided intrinsic matrix is that of the RGB camera. The description of the API contains a mistake, I apologize for that.
The RGB and depth images should already be aligned by Apple, so you can e.g. scale the depth image to the same resolution as the RGB image (as you do in your script).
I currently do not plan to add the confidence image as a part of the USB stream, but if enough people will ask for this feature, I might add it.

I tested your script and the scale of the resulting point cloud looks OK to me; I visually compared it to the corresponding OBJ file exported by Record3D, but I did not notice a scale discrepancy (at least not when manually aligning the two point clouds in CloudCompare after exporting the Open3D point cloud by o3d.io.write_point_cloud('/tmp/o3d_pc.ply', pcd)).

May I ask how much bigger was the Record3D-exported point cloud compared to the point cloud generated by your script and which Record3D export option did you use?

yorard123 commented 10 months ago

Hi, when I said bigger I was not referring to the scale of the pointcloud, instead, I was talking about the number of points. For instance, the obj that the application exports has about 300k points, and the pc that my script generates has about 600k.

About that:

I currently do not plan to add the confidence image as a part of the USB stream, but if enough people will ask for this feature, I might add it.

I think it would be super useful and help a lot on getting better 3D reconstructions on streaming applications. If I can collaborate on any way to get that done, please let me know.

marek-simonik commented 9 months ago

Apologies for the delay in my response and thanks for explaining what you meant.

The lower amount of points in the exported point clouds is the expected outcome — I deliberately lowered the amount of points in order to reduce the file size of the exported files (after all, the depth map is being upscaled, so no geometric detail is lost).

Regarding addition of the confidence map into the USB stream: I will try to implement it into the next Record3D update.

yorard123 commented 9 months ago

That makes sense! Which is the criteria that you applied to lower the amount of points?

marek-simonik commented 9 months ago

The RGB images were downsampled to VGA resolution.

marek-simonik / record3d

Doubts about the generation of the 3D point cloud, intrinsic matrix and alignment between rgb and depth frames. #76