xukechun / Vision-Language-Grasping

[ICRA 2023] A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
86 stars 11 forks source link

Test on real dataset #5

Closed lif314 closed 9 months ago

lif314 commented 10 months ago

Thank you for your work, how can I test it on a real data set? eg, GraspNet-1Billion dataset.

xukechun commented 9 months ago

Hi,

Thanks for your attention to our work! The GraspNet-1Billion dataset is to test the performance of grasp detection. Thus, they only provide RGB-D images without language instructions. Although the object models are provided, they don't provide the accurate pose information of each object in the scene to realize their image in the simulation for manipulation interaction. Thus, we only use their object models to generate scenes for testing.

If you want to test on the same scenes in the GraspNet-1Billion dataset, you can first get the pose information of objects in each scene using their pose annotation tool to reproduce the scene in the simulation, and then pair each scene with a language instruction.

Best, Kechun Xu

lif314 commented 9 months ago

Hi,

Thanks for your attention to our work! The GraspNet-1Billion dataset is to test the performance of grasp detection. Thus, they only provide RGB-D images without language instructions. Although the object models are provided, they don't provide the accurate pose information of each object in the scene to realize their image in the simulation for manipulation interaction. Thus, we only use their object models to generate scenes for testing.

If you want to test on the same scenes in the GraspNet-1Billion dataset, you can first get the pose information of objects in each scene using their pose annotation tool to reproduce the scene in the simulation, and then pair each scene with a language instruction.

Best, Kechun Xu

Thank you for your reply!