Open michel-aractingi opened 1 week ago
We can use the pushT
environment to test if RLPD is working properly. This will also allow us to compare to our baseline RL algorithm TD-MPC
.
PushT has two modes of observations, an image state and a privileged vector state with 'keypoints'. Training with the keypoints state is an easier task that can be useful to quickly validate that your implementation is working. Training with the image state is our end-goal.
You can try training PushT with our TD-MPC to get a better idea. Here are the relevant config files and training commands. Make sure that you enable wandb and set it properly on your system so that you can monitor training and observe the eval runs.
Config files (change to .yaml
and add to lerobot/lerobot/configs/policy
):
tdmpc_pusht_keypoints.txt
tdmpc_pusht.txt
Run the training commands in the following files: train_pusht_keypoints.txt train_pusht.txt
For more references on TD-MPC: main paper, FOWM paper, Alexander Soare videos 1 and 2.
Thanks for initiating this! I would actually recommend using Cartesian space control whenever you can do that, as in our experience it simplifies a lot of stuff in the learning process.
But I guess many audience in this PR are also interested in using RL for low-cost robots which don't have built-in EE control, so I am also curious how that works in practice.
@jeffreywu13579 @charlesxu0124
HIL-SERL in LeRobot
On porting HIL-SERL to LeRobot. This page will outline the minimal list of components and tasks that should be implemented in the LeRobot codebase. The official reference of implementation is available in JAX here.
We will coordinate on discord #port-hil-serl. We will update this page with ID of owners of each components. We encourage several people to work in team on each component. You don’t need to write extensive code on your own to make a valuable contribution. Any input on a sub-component, however small, is appreciated. Feel free to add extra component to the list, if needed; this is only a guide, and we welcome more ideas.
Note: In parallel, we are refactoring the codebase, thus you don't need to refactor yourself. Do not hesitate to copy files and code elements to arrive at a first working version as fast as possible.
RLPD (Reinforcement Learning with Prior Data)
lerobot/lerobot/common/policies/hilserl
lerobot/scripts/train.py
for offline and online data buffers and dataloader. TD-MPC implementation in LeRobotlerobot/common/policies/tdmpc/
.Human Interventions
record
function Possibly interfaced with keyboard keys to stop the policy and give a few seconds for the user to be ready to take-over.Reward Classifier
lerobot/scripts/eval.py
or in the RLPD code to query the reward every time a new frame is added to the online dataset.Other Implementations
ManipulatorRobot
; we need to make sure the name linked to the address is the same for feetech and dynamixel motors. This is the velocity for feetechPresent_Speed
and torque for feetechPresent_Current
.Note: The paper uses end-effector control and velocity control for dynamic tasks, but our first implementation won't include them.