huggingface / lerobot

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Apache License 2.0
6.54k stars 582 forks source link

Thomwolf 2024 06 12 deep dive act #265

Closed thomwolf closed 3 months ago

thomwolf commented 3 months ago

[DRAFT WIP] Work in progress: deep dive in differences between our ACT implementation for real world data and the implementation from https://github.com/thomwolf/ACT

The goal is to see if we find some room for improvements in our ACT short data trainings (reducing jitter in the same conditions as the original ACT code)

Currently listed differences:


This change is Reviewable

haixuanTao commented 3 months ago

Just a quick note on gym_real_world. At pollen robotics, we tried to add a new task id input and:

policy = make_policy(hydra_cfg, pretrained_policy_name_or_path)

while True:

observation = {
    "image": cv2.VideoCapture(0),
    "qpos": dynamixel.read([0, 1, 2, 3, 4, 5, 6]]
}

observation = preprocess_observation(observation)

action = policy.select_action(observation)

time.sleep(1/fps - inference_time)

dynamixel.write(action, [0, 1, ,2 ,3, 4, 5, 6])


 I think that we should let the user handle defining gym_env and let him manage it if he wants to do, but should probably not be the default way of using lerobot,

So basically, i'm opening the discussion as can and should we remove the gym environment at inference time for the real world
thomwolf commented 3 months ago

Thank @haixuanTao, maybe more a comment for https://github.com/huggingface/lerobot/pull/246?

This PR is really just a deep dive in the algorithmic differences between our implementation of ACT and the original, in particular on the model side. I'll probably close this PR and open a series of smaller ones updating some of these differences.