real-stanford / universal_manipulation_interface

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
https://umi-gripper.github.io/
MIT License
498 stars 92 forks source link

Dataset Structure Probelm #27

Open YihanLi126 opened 3 months ago

YihanLi126 commented 3 months ago

Hello dear authors,

I've run the SLAM pipeline and got the generated dataset for diffusion policy training in the form of the following six folders:

  1. camera0_rgb
  2. robot0_demo_start_pose
  3. robot0_demo_end_pose
  4. robot0_eef_pose
  5. robot0_eef_rot_axis_angle
  6. robot0_gripper_width

I'm confused about the contents of the six folders for most of them are binary files, and her are my questions:

  1. For robot0_demo_start_pose and robot0_demo_end_pose, what's the structure of the pose information, and what's the definition of 'start' and 'end' ? Are there any specific timestamps?
  2. Is robot0_eef_pose in the form of 3D Cartesian Coordinate and robot0_eef_rot_axis_angle in the form of quarternion?
  3. Do camera0_rgb, robot0_demo_start_pose and robot0_demo_end_pose act as At, and robot0_eef_pose, robot0_eef_rot_axis_angleandrobot0_gripper_widthact asOt` in the input of the diffusion policy?
  4. What's the exact structure of the input and output of the SLAM pipeline?
  5. What's the function of the marker in the Mapping Video step in data collection? Is it only for the scale calibration or it also acts as a reference for global localization of the end effector?

Thank you so much for your patience!

bjin-bdai commented 3 months ago

I think all "poses" (i.e. robot0_eef_pose) task-space coordinate in the following form:

so robot0_eef_rot_axis_angle is a 3-element vector representing a rotation vector