Closed dennisushi closed 5 months ago
Hi @dennisushi, thanks for your interest! First, regarding the dataset format, you can find an example here or here. For the instructions, you're right, you can follow the process outlined in CALVIN. Depending on the task, you may need more or less demos. Based on our experience, 15 demos were enough for most of the real-world tasks we considered. We recorded only the keyposes and trainrd the model to predict the next keypose only. Good luck!
Thanks for the quick response!
Hi, I am struggling to reproduce the real-world results. I made a dataset of 14 demonstrations of the simplest task I could think off, reach object, moving the starting position of the robot and the position of the object in each demonstration. I trained the model for 600k steps as default. The position loss barely moved. Can you give any tips what may be wrong?
For reference, this is what the demo's look like ep0.zip, each having 4-6 keyposes depending on the positions (should the number of keyposes be the same across all?):
The state data is collected as
gripper_trans = robot_arm.get_pose()[:3,3]
gripper_quat = rotation_to_quaternion(robot_arm.get_pose()[:3,:3])
gripper_open = (not robot_arm.is_grasped())
gripper_input = np.concatenate([gripper_trans, gripper_quat, [gripper_open]])
Hi,
In our experience, we would first check if the location bound is set up correctly. For example, these are the location bounds for RLBench and CALVIN. We would re-scale the position xyz to the range of [-1, 1].
Secondly, you can check if the point cloud and the end-effector pose is embedded in the same coordinate system. One quick experiment for debugging is to train the Act3D baseline, which should give you a hint if the input/output are formatted correctly.
Lastly, for your reference, we collect our real-world demo following this script in this repo. Hope this will provide some hints to address your issue.
you can check if the point cloud and the end-effector pose is embedded in the same coordinate system
So any external camera needs to be transformed to the same world frame as the grasp pose?
I had not put any bounds - the method was using some automatically calculated ones, I will try again with all these tips, thanks!
Hello,
Thank you for providing the training dataset. I noticed the dataset is in
.dat
format, and I was wondering if there are specific instructions available for generating data in this format.For creating a new dataset using a real Panda robot, I understand from your guidance to refer to this repository. To ensure compatibility, it appears necessary to save the data in the same
.dat
format, including matching the data fields. Additionally, for generating a newinstructions.pkl
file, am I correct in assuming that we can follow the process outlined in the CALVIN example? I am assuming that as in your paper, 20 demonstrations would be enough. Should I capture the whole trajectory or just the keyframes?Could you please confirm these steps or provide further details if available? Are there other important considerations or instructions we should be aware of when creating and formatting a new dataset for this project?
Thank you for your assistance.