Open alexander-soare opened 6 months ago
As per the original code, this is expected behavior from the reward:
if touch_left_gripper and not touch_table: # successful transfer
reward = 4
Should we change the reward to something like this:
self.success = np.full(total_steps, False)
...
if touch_both_grippers and not touch_table:
self.success[current_step] = True
if self.success[-10:].all():
reward = 4
cc @alexander-soare @Cadene
@aliberts thanks for picking this up.
I was thinking more like:
# using a list fixes a bug, that you probs would have caught anyway
# calling it something other than "success" makes it clear that it is not the true success conditioh
self._transient_success = []
...
# most important change is the condition
if touch_left_gripper and not touch_right_gripper and not touch_table:
self._transient_success.append(True)
else:
self._transient_success.append(False)
# Use simulation time (here 0.5 seconds) rather than timesteps.
if all(self._transient_success[-int(0.5/dt):]):
reward = 4
success = True
The condition is all I really meant to change, but then I got carried away :P
Regarding the higher level question of backwards compatibility vs correctness, maybe we can have both versions somehow available. But in any case, in the interest of making robotics really work IRL, IMO we should aim for correctness always.
This registers as a success, but it shouldn't, given that the task is to transfer the cube from one arm to another.
https://github.com/huggingface/gym-aloha/assets/16543381/3aae99a0-b714-4f88-ba8b-545d6efa69ae
Here is the script for reproduction:
See the actions.txt file attached. actions.txt