extremely massive amount of proposed keypoints using Realsense RGB-D Camera

Everloom-129 commented 4 hours ago

Hi everyone,

I’m reproducing the ReKep experiment with a Realsense RGB-D camera(d435) , but I'm encountering some issues. The pipeline consistently generates many proposed keypoints, and the results are not as expected. I suspect the following factors could be contributing to the problem:

Workspace constraints in the configuration may not match the robot's actual setup.
The scene complexity is higher than in the original setup (which used a wrist-mounted FPV camera).

Additionally, I'm facing instances where the k-means algorithm fails to converge. I guess it may related to the issue above. Any suggestions on troubleshooting or improving these aspects would be appreciated.

Everloom-129 commented 4 hours ago

Update:

(sam2) tonyw@pal:~/VLM/ReKep$ sh vision.sh 
# main
Debug: Input image shape: (480, 640, 3)
Debug: Input depth shape: (480, 640)
Debug: Generated 22 masks
Debug: masks shape: (480, 640)
Debug: Type of masks: <class 'list'>
Debug: Generated point cloud with shape: (307200, 3)
# proposal part
Debug: shape_info: {'img_h': 480, 'img_w': 640, 'patch_h': 34, 'patch_w': 45}
Debug: transformed_rgb shape: (476, 630, 3)
Debug: interpolated_feature_grid shape: torch.Size([480, 640, 384])
Debug: features_flat shape: torch.Size([307200, 384])
Debug: points shape: (307200, 3)
Debug: features_flat shape: torch.Size([307200, 384])
Debug: number of mask groups: 22
Debug: shape of first mask group: (480, 640)
Debug: shape of first mask: (480, 640)
Debug: feature_points shape: (10124, 3)
Debug: feature_pixels shape: (10124, 2)
Debug: obj_features_flat shape: torch.Size([10124, 384])

above is the input data shape I input system

Here is my fork version for rgbd camera depolyment:

https://github.com/Everloom-129/ReKep/blob/vision_w_rgbd/keypoint_proposal.py

Sui0206 commented 3 hours ago

Can you check that is there any NaN value in your K array before your K-means?

huangwl18 / ReKep

extremely massive amount of proposed keypoints using Realsense RGB-D Camera #16