leolyliu / HOI4D-Instructions

The official instructions of HOI4D dataset.
MIT License
52 stars 7 forks source link

Inconsistent data visualizations for hand/object poses and scene point clouds #12

Closed zerchen closed 1 year ago

zerchen commented 1 year ago

Hi @leolyliu,

Thanks for your great work! I visuslized the hand/object poses and scene point clouds together, and found the issue that they are not consistent since the object should sit on the table. I follow the practice provided in README to load hand/object poses and directly load the scene point clouds from "3DSeg/raw_pc.pcd". Is there anything missing here to make these annotations look right? Thanks for your help in advance.

Best, Zerui E6087913-7E46-439C-AF9A-00B5553DAD32

zerchen commented 1 year ago

Solved. Actually, raw_pc.pcd is not in the camera coordinate system of the first frame (i.e. the world coordinate system), so need additional alignment process to derive the relative transformations between raw_pc.pcd and the world coordinate system.

anyeZHY commented 1 year ago

@zerchen Hi! I’m facing the same problem now. How did you solve it?

zerchen commented 1 year ago

Hi @anyeZHY ,

You first need to load the provided raw_pc.pcd:

raw_pcd_path = os.path.join(anno_dir, seq_id.replace('-', '/'), '3Dseg', 'raw_pc.pcd')
given_pc = o3d.io.read_point_cloud(raw_pcd_path)

Then, directly reconstruct the whole-scene point clouds from a set of camera poses:

whole_points = []
for filename in natsorted(os.listdir(hand_pose_dir)):
    frame_idx = int(filename.split('.')[0])
    rgb_filename = str(frame_idx).zfill(5) + '.jpg'
    rgb_path = os.path.join(rgb_dir, seq_id.replace('-', '/'), 'align_rgb', rgb_filename)
    depth_filename = str(frame_idx).zfill(5) + '.png'
    depth_path = os.path.join(depth_dir, seq_id.replace('-', '/'), 'align_depth', depth_filename)
    mask_path = os.path.join(mask_dir, depth_filename)

    color_raw = o3d.io.read_image(rgb_path)
    depth_background = convert(depth_path, mask_path)
    rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(color_raw, depth_background, convert_rgb_to_intensity=False)
    pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd_image, phc, extrinsic=cam_pose[frame_idx].extrinsic)
    pcd = pcd.voxel_down_sample(voxel_size=0.02)
    points = np.asarray(pcd.points)
    whole_points.append(points)

Finally, resolve the relative transformation betwen the reconstructed point clouds and the given point clouds:

whole_points = np.concatenate(whole_points)
recon_pc = o3d.geometry.PointCloud()
recon_pc.points = o3d.utility.Vector3dVector(whole_points)
recon_pc = recon_pc.voxel_down_sample(voxel_size=0.01)
recon_pc, _ = recon_pc.remove_statistical_outlier(nb_neighbors=16, std_ratio=0.5)
given_pc, _ = given_pc.remove_statistical_outlier(nb_neighbors=16, std_ratio=0.5)

reg_p2p = o3d.pipelines.registration.registration_icp(recon_pc, given_pc, 1.0, np.eye(4), o3d.pipelines.registration.TransformationEstimationPointToPoint())
align_matrix = reg_p2p.transformation

The visualizations of hand/object pose toghther with scenes should look fine by transforming by align_matrix: world_obj_pose = align_matrix @ world_obj_pose

Hope it helps.

renyuanzhe commented 6 months ago

hello, does the writer provide the camera pose of each frame ?

leolyliu commented 6 months ago

@renyuanzhe The camera intrinsics are released at https://drive.google.com/file/d/1mQaPw_dVLYE31BM3_2M_gHoo_ufAJ6kw/view?usp=sharing.