simpler-env / SimplerEnv

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
https://simpler-env.github.io/
MIT License
335 stars 44 forks source link

Controller difference for different envs #11

Open HuFY-dev opened 4 months ago

HuFY-dev commented 4 months ago

Hi, thanks for the great work! I'm playing around with your environments and found that both RT1 and Octo seems to be capable of only the google_robot tasks but not the widowx tasks.

Further, I noticed that google_robot environments use the arm_pd_ee_delta_pose + gripper_pd_joint_target_delta_pos controllers while widowx environments use the arm_pd_ee_target_delta_pose + gripper_pd_joint_pos controllers. Notice that both the arm and gripper controllers are different. However, in your model wrappers, you seem to be treating the world_vector and rot_axangle the same way regardless of the difference in the controller. I wonder if that's causing the models to fail widowx tasks

FYI, I got the controllers using env.unwrapped.control_mode. More details on controllers can be found here

xuanlinli17 commented 4 months ago

Hi,

Octo models do not fail on the widowx tasks. The controller we use are here: https://github.com/simpler-env/SimplerEnv/blob/main/simpler_env/utils/env/env_builder.py .

Note that "delta_pose_align" is different from "delta_pose". You can see https://github.com/simpler-env/ManiSkill2_real2sim/blob/cd45dd27dc6bb26d048cb6570cdab4e3f935cc37/mani_skill2_real2sim/agents/controllers/pd_ee_pose.py#L202 for "align" vs "non-align". In simple terms, "delta_pose_align" decouples translation and rotation, instead of directly multiplying 2 SE(3) matrices.

HuFY-dev commented 4 months ago

Thanks, let me look into that!

HuFY-dev commented 4 months ago

Also, why there's no difference in handling the different arm controllers in the model wrappers? Seems like only the gripper actions are handled differently.

In terms of gripper actions, can you explain why you set self.sticky_gripper_num_repeat = 15 for google and 1 for bridge? I feel like it came from nowhere

xuanlinli17 commented 4 months ago

The sticky_gripper_num_repeat was set only for Octo models to match the real eval. Our implementations of RT-* and Octo match the real eval setup.

OpenVLA likely uses a different setup from RT-* and Octo for real eval, so need to connect to the authors to verify if (and how) they implement the sticky actions in real eval. There are other things like action ensembling and obs len / action len that need to be verified too.

Could you share some videos of OpenVLA failing on Bridge? Is the arm reaching the object? If it's not reaching, most likely there are implementation issues.

HuFY-dev commented 4 months ago

One failure case of OpenVLA:

output

Here are some explanations from the OpenVLA authors, but I don't think they are clear enough.

Can you share where you found the sticky_gripper_num_repeat in the original setting from the authors of Octo?

xuanlinli17 commented 4 months ago

This is almost surely an implementation issue. Looks like the rotation orderings might be wrong for Bridge envs for OpenVLA. OpenVLA might output ypr instead of rpy for Bridge. That is, on Bridge evaluations, while for Octo, roll, pitch, yaw = action_rotation_delta, for OpenVLA, it might be the case that yaw, pitch, roll = action_rotation_delta or things like pitch, roll, yaw = action_rotation_delta (you can tweek with different combinations).

sticky_gripper_num_repeat is not in the public Octo repo. I communicated with the authors directly and referenced their real eval implementations.

HuFY-dev commented 4 months ago

I tried all arrangements and neither worked. The task is to pick up the spoon, and the gripper should move forward in order to do that. However, every time it always goes straight down without moving forward, which is very weird.

Toradus commented 4 months ago

Have you tested with other tasks? Im currently training my own custom model and try to make a eval on SimplerEnv, which works very nice for Fractal but also struggles alot with training on the Octo provided Bridge_v2 and evaluating on the spoon task. I also get the same results of the robot driving into the table for the spoon task, but for the carrot task it is at least aiming at the right target. Note that this could easily be a model problem on my end, but its interesting that the OpenVLA eval looks about the same as my eval. I copy pasted the OctoInference provided by Simpler and adjusted it for my model to work. Did OpenVLA do the same?

https://github.com/simpler-env/SimplerEnv/assets/23532458/05db04a4-f943-4652-aa08-241c3fbd3d90

https://github.com/simpler-env/SimplerEnv/assets/23532458/91ac59cc-ece2-48bc-9c63-2dd54aeb5700

HuFY-dev commented 4 months ago

Yes my model wrapper was also modified from the Octo code. However, from my understanding, the main part of that code is processing the rotation from rpy to axis angles and perform the sticky gripper logic, but it seems what's going wrong is the translation. Here are some carrot examples:

output (1)

output (2)