Closed learningxiaobai closed 2 years ago
There is no native argument that does this. I suggest you constrain the action. You can do this by wrapping the send (with gym.Wrapper
). Your wrapper would look like
from gym import Wrapper
class StayFrontWrapper(gym.Wrapper):
def reset(self):
obs = super().reset()
self._last_obs = obs["observation"]
return obs
def step(self, action):
if not self.is_front():
action[0] = 0.0 # x displacement
obs, reward, done, info = super().step(action)
self._last_obs = obs["observation"]
return obs, reward, done, info
def is_front(self):
if self._last_obs[0] > 0.0: # x ee position
return True
else:
return False
I haven't tested this code, but it gives you an idea of what I would do if I were you.
Feel free to publish the solution you finally choose here, it could help other people.
Thanks a lot!
Hello,I just only change this RobotTaskEnv class(add def is_front(),modify the steps by adding :if not self.is_front(): action[0] = 0.0 # x displacement,adding self._last_obs = obs["observation"] in init),but don't work,the robot still go back as usual,Do you know Why,thanks! `class RobotTaskEnv(gym.GoalEnv):
metadata = {"render.modes": ["human", "rgb_array"]}
def __init__(self):
self.seed() # required for init for can be changer later
obs = self.reset()
observation_shape = obs["observation"].shape
achieved_goal_shape = obs["achieved_goal"].shape
desired_goal_shape = obs["achieved_goal"].shape
self._last_obs = obs["observation"]
self.observation_space = spaces.Dict(
dict(
observation=spaces.Box(-np.inf, np.inf, shape=observation_shape),
desired_goal=spaces.Box(-np.inf, np.inf, shape=achieved_goal_shape),
achieved_goal=spaces.Box(-np.inf, np.inf, shape=desired_goal_shape),
)
)
self.action_space = self.robot.action_space
self.compute_reward = self.task.compute_reward
self.render = self.sim.render
def _get_obs(self):
robot_obs = self.robot.get_obs() # robot state
task_obs = self.task.get_obs() # object position, velococity, etc...
observation = np.concatenate([robot_obs, task_obs])
achieved_goal = self.task.get_achieved_goal()
return {
"observation": observation,
"achieved_goal": achieved_goal,
"desired_goal": self.task.get_goal(),
}
def reset(self):
with self.sim.no_rendering():
self.robot.reset()
self.task.reset()
return self._get_obs()
def step(self, action):
if not self.is_front():
action[0] = 0.0 # x displacement
self.robot.set_action(action)
self.sim.step()
obs = self._get_obs()
done = False
info = {
"is_success": self.task.is_success(obs["achieved_goal"], self.task.get_goal()),
}
reward = self.task.compute_reward(obs["achieved_goal"], self.task.get_goal(), info)
return obs, reward, done, info
def seed(self, seed=None):
"""Setup the seed."""
return self.task.seed(seed)
def close(self):
self.sim.close()
def is_front(self):
if self._last_obs[0] > 0: # x ee position
return True
else:
return False`
Have you check whether the is_front
method work as expected? You can print something when is_front
returns True
and compare with what you observe in the rendering.
Have you check whether the
is_front
method work as expected? You can print something whenis_front
returnsTrue
and compare with what you observe in the rendering.
thanks ,I will try
Have you check whether the
is_front
method work as expected? You can print something whenis_front
returnsTrue
and compare with what you observe in the rendering.
@qgallouedec ,thanks a lot,I put self._last_obs = obs["observation"] in the _get_obs,not in the init.It works. Btw, do you know some useful Rl Library(except RB3(which takes a long time for me to train,may be the environment is too hard ))to train or you write the Rl algorithm yourself?
I use mainly SB3. You can speed up learning by using multiprocessing, but AFIK it is not released yet for off-policy algorithms (soon, maybe ?). Check https://github.com/DLR-RM/stable-baselines3/issues/179#issue-715804276, it can help.
Hello,Is it possible to limit the range of movement of the robot, because the target is in the front, but it explores the back (or other places), which is inefficient, just like this:
Thanks a lot.