Open Everloom-129 opened 4 hours ago
Update:
(sam2) tonyw@pal:~/VLM/ReKep$ sh vision.sh
# main
Debug: Input image shape: (480, 640, 3)
Debug: Input depth shape: (480, 640)
Debug: Generated 22 masks
Debug: masks shape: (480, 640)
Debug: Type of masks: <class 'list'>
Debug: Generated point cloud with shape: (307200, 3)
# proposal part
Debug: shape_info: {'img_h': 480, 'img_w': 640, 'patch_h': 34, 'patch_w': 45}
Debug: transformed_rgb shape: (476, 630, 3)
Debug: interpolated_feature_grid shape: torch.Size([480, 640, 384])
Debug: features_flat shape: torch.Size([307200, 384])
Debug: points shape: (307200, 3)
Debug: features_flat shape: torch.Size([307200, 384])
Debug: number of mask groups: 22
Debug: shape of first mask group: (480, 640)
Debug: shape of first mask: (480, 640)
Debug: feature_points shape: (10124, 3)
Debug: feature_pixels shape: (10124, 2)
Debug: obj_features_flat shape: torch.Size([10124, 384])
above is the input data shape I input system
Here is my fork version for rgbd camera depolyment:
https://github.com/Everloom-129/ReKep/blob/vision_w_rgbd/keypoint_proposal.py
Can you check that is there any NaN value in your K array before your K-means?
Hi everyone,
I’m reproducing the ReKep experiment with a Realsense RGB-D camera(d435) , but I'm encountering some issues. The pipeline consistently generates many proposed keypoints, and the results are not as expected. I suspect the following factors could be contributing to the problem:
Workspace constraints in the configuration may not match the robot's actual setup.
The scene complexity is higher than in the original setup (which used a wrist-mounted FPV camera).
Additionally, I'm facing instances where the k-means algorithm fails to converge. I guess it may related to the issue above. Any suggestions on troubleshooting or improving these aspects would be appreciated.