Open HuFY-dev opened 4 months ago
Hi,
Octo models do not fail on the widowx tasks. The controller we use are here: https://github.com/simpler-env/SimplerEnv/blob/main/simpler_env/utils/env/env_builder.py .
Note that "delta_pose_align" is different from "delta_pose". You can see https://github.com/simpler-env/ManiSkill2_real2sim/blob/cd45dd27dc6bb26d048cb6570cdab4e3f935cc37/mani_skill2_real2sim/agents/controllers/pd_ee_pose.py#L202 for "align" vs "non-align". In simple terms, "delta_pose_align" decouples translation and rotation, instead of directly multiplying 2 SE(3) matrices.
Thanks, let me look into that!
Also, why there's no difference in handling the different arm controllers in the model wrappers? Seems like only the gripper actions are handled differently.
In terms of gripper actions, can you explain why you set self.sticky_gripper_num_repeat = 15
for google and 1 for bridge? I feel like it came from nowhere
The sticky_gripper_num_repeat
was set only for Octo models to match the real eval. Our implementations of RT-* and Octo match the real eval setup.
OpenVLA likely uses a different setup from RT-* and Octo for real eval, so need to connect to the authors to verify if (and how) they implement the sticky actions in real eval. There are other things like action ensembling and obs len / action len that need to be verified too.
Could you share some videos of OpenVLA failing on Bridge? Is the arm reaching the object? If it's not reaching, most likely there are implementation issues.
One failure case of OpenVLA:
Here are some explanations from the OpenVLA authors, but I don't think they are clear enough.
Can you share where you found the sticky_gripper_num_repeat
in the original setting from the authors of Octo?
This is almost surely an implementation issue. Looks like the rotation orderings might be wrong for Bridge envs for OpenVLA. OpenVLA might output ypr instead of rpy for Bridge. That is, on Bridge evaluations, while for Octo, roll, pitch, yaw = action_rotation_delta
, for OpenVLA, it might be the case that yaw, pitch, roll = action_rotation_delta
or things like pitch, roll, yaw = action_rotation_delta
(you can tweek with different combinations).
sticky_gripper_num_repeat
is not in the public Octo repo. I communicated with the authors directly and referenced their real eval implementations.
I tried all arrangements and neither worked. The task is to pick up the spoon, and the gripper should move forward in order to do that. However, every time it always goes straight down without moving forward, which is very weird.
Have you tested with other tasks? Im currently training my own custom model and try to make a eval on SimplerEnv, which works very nice for Fractal but also struggles alot with training on the Octo provided Bridge_v2 and evaluating on the spoon task. I also get the same results of the robot driving into the table for the spoon task, but for the carrot task it is at least aiming at the right target. Note that this could easily be a model problem on my end, but its interesting that the OpenVLA eval looks about the same as my eval. I copy pasted the OctoInference provided by Simpler and adjusted it for my model to work. Did OpenVLA do the same?
https://github.com/simpler-env/SimplerEnv/assets/23532458/05db04a4-f943-4652-aa08-241c3fbd3d90
https://github.com/simpler-env/SimplerEnv/assets/23532458/91ac59cc-ece2-48bc-9c63-2dd54aeb5700
Yes my model wrapper was also modified from the Octo code. However, from my understanding, the main part of that code is processing the rotation from rpy to axis angles and perform the sticky gripper logic, but it seems what's going wrong is the translation. Here are some carrot examples:
Hi, thanks for the great work! I'm playing around with your environments and found that both RT1 and Octo seems to be capable of only the
google_robot
tasks but not thewidowx
tasks.Further, I noticed that
google_robot
environments use thearm_pd_ee_delta_pose
+gripper_pd_joint_target_delta_pos
controllers whilewidowx
environments use thearm_pd_ee_target_delta_pose
+gripper_pd_joint_pos
controllers. Notice that both the arm and gripper controllers are different. However, in your model wrappers, you seem to be treating theworld_vector
androt_axangle
the same way regardless of the difference in the controller. I wonder if that's causing the models to failwidowx
tasksFYI, I got the controllers using
env.unwrapped.control_mode
. More details on controllers can be found here