Currently, when EEF limtis + EVAL are running, and those limits are violated, every rollout in the EVAL gets terminated early on. This continues until the eval number of steps/rollouts is completed.
The same thing happens if an object falls.
Should we accept this behavior? The behavior should be common early on but improve as the robot learns.
Should we turn off this limits with a flag (todo)?
The code does not crash, but you get a null evaluation result.
After further revision, it seems that EEF workspace works well.
commit 2ede60e76d7ca1460f38b9fb7d9db3d7177bc7af in picking_env fixed many bugs.
Still, after a fallen_object, if we are using HER, reset_internal() L1306 gets called (self.update_object_goal_her_poses()) but here, new observables have not yet been updated and the setting of an old object triggers an exception.
observables are not yet set in base.py:MujocoEnv.reset() (that will be called at the end of reset internal). Would moving to after the setup of observables work?
Checks for fallen objects happens inside picking.py._get_obs(), which gets called in the picking.py.step() method under a control loop (about 30 cycles per policy step).
Could add a check for the flag... if true, no more checking.
Currently, when EEF limtis + EVAL are running, and those limits are violated, every rollout in the EVAL gets terminated early on. This continues until the eval number of steps/rollouts is completed.
The same thing happens if an object falls.