nickgkan / 3d_diffuser_actor

Code for the paper "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"
https://3d-diffuser-actor.github.io/
MIT License
199 stars 24 forks source link

Using custom data, and using real robot #7

Closed dennisushi closed 5 months ago

dennisushi commented 6 months ago

Hello,

Thank you for providing the training dataset. I noticed the dataset is in .dat format, and I was wondering if there are specific instructions available for generating data in this format.

For creating a new dataset using a real Panda robot, I understand from your guidance to refer to this repository. To ensure compatibility, it appears necessary to save the data in the same .dat format, including matching the data fields. Additionally, for generating a new instructions.pkl file, am I correct in assuming that we can follow the process outlined in the CALVIN example? I am assuming that as in your paper, 20 demonstrations would be enough. Should I capture the whole trajectory or just the keyframes?

Could you please confirm these steps or provide further details if available? Are there other important considerations or instructions we should be aware of when creating and formatting a new dataset for this project?

Thank you for your assistance.

nickgkan commented 6 months ago

Hi @dennisushi, thanks for your interest! First, regarding the dataset format, you can find an example here or here. For the instructions, you're right, you can follow the process outlined in CALVIN. Depending on the task, you may need more or less demos. Based on our experience, 15 demos were enough for most of the real-world tasks we considered. We recorded only the keyposes and trainrd the model to predict the next keypose only. Good luck!

dennisushi commented 6 months ago

Thanks for the quick response!

dennisushi commented 6 months ago

Hi, I am struggling to reproduce the real-world results. I made a dataset of 14 demonstrations of the simplest task I could think off, reach object, moving the starting position of the robot and the position of the object in each demonstration. I trained the model for 600k steps as default. The position loss barely moved. Can you give any tips what may be wrong?

image

For reference, this is what the demo's look like ep0.zip, each having 4-6 keyposes depending on the positions (should the number of keyposes be the same across all?):

ep0_1 ep0_4

The state data is collected as

        gripper_trans = robot_arm.get_pose()[:3,3]
        gripper_quat = rotation_to_quaternion(robot_arm.get_pose()[:3,:3])
        gripper_open =  (not robot_arm.is_grasped())
        gripper_input = np.concatenate([gripper_trans, gripper_quat, [gripper_open]])
twke18 commented 6 months ago

Hi,

In our experience, we would first check if the location bound is set up correctly. For example, these are the location bounds for RLBench and CALVIN. We would re-scale the position xyz to the range of [-1, 1].

Secondly, you can check if the point cloud and the end-effector pose is embedded in the same coordinate system. One quick experiment for debugging is to train the Act3D baseline, which should give you a hint if the input/output are formatted correctly.

Lastly, for your reference, we collect our real-world demo following this script in this repo. Hope this will provide some hints to address your issue.

dennisushi commented 6 months ago

you can check if the point cloud and the end-effector pose is embedded in the same coordinate system

So any external camera needs to be transformed to the same world frame as the grasp pose?

I had not put any bounds - the method was using some automatically calculated ones, I will try again with all these tips, thanks!