huggingface / gym-aloha

A gym environment for ALOHA
Apache License 2.0
59 stars 19 forks source link

Success registration for cube transfer is questionable. #4

Open alexander-soare opened 6 months ago

alexander-soare commented 6 months ago

This registers as a success, but it shouldn't, given that the task is to transfer the cube from one arm to another.

https://github.com/huggingface/gym-aloha/assets/16543381/3aae99a0-b714-4f88-ba8b-545d6efa69ae

Here is the script for reproduction:


import gym_aloha
import gymnasium as gym
import numpy as np
import imageio
from lerobot.common.utils.io_utils import write_video

kwargs = {
    "obs_type": "pixels_agent_pos",
    "render_mode": "rgb_array",
    "max_episode_steps": 400,
    "visualization_width": 384,
    "visualization_height": 384,
}
env = gym.make("gym_aloha/AlohaTransferCube-v0", disable_env_checker=True, **kwargs)

with open("actions.txt", "r") as f:
    actions = [[float(s) for s in l.strip().split(" ")] for l in f.readlines()]

env.reset(seed=1001)

frames = [env.render()]
for action in actions:
    env.step(np.array(action))
    frames.append(env.render())

write_video("output.mp4", np.stack(frames), fps=50)

See the actions.txt file attached. actions.txt

aliberts commented 6 months ago

As per the original code, this is expected behavior from the reward:

if touch_left_gripper and not touch_table: # successful transfer
    reward = 4

Should we change the reward to something like this:

self.success = np.full(total_steps, False)

...

if touch_both_grippers and not touch_table:
    self.success[current_step] = True
    if self.success[-10:].all():
        reward = 4

cc @alexander-soare @Cadene

alexander-soare commented 6 months ago

@aliberts thanks for picking this up.

I was thinking more like:

# using a list fixes a bug, that you probs would have caught anyway
# calling it something other than "success" makes it clear that it is not the true success conditioh
self._transient_success = []  

...

# most important change is the condition
if touch_left_gripper and not touch_right_gripper and not touch_table:
    self._transient_success.append(True)
else:
    self._transient_success.append(False)

# Use simulation time (here 0.5 seconds) rather than timesteps.
if all(self._transient_success[-int(0.5/dt):]):
    reward = 4
    success = True  

The condition is all I really meant to change, but then I got carried away :P

Regarding the higher level question of backwards compatibility vs correctness, maybe we can have both versions somehow available. But in any case, in the interest of making robotics really work IRL, IMO we should aim for correctness always.