Restrict the panda robot to only move in the front

learningxiaobai commented 2 years ago

Hello，Is it possible to limit the range of movement of the robot， because the target is in the front, but it explores the back (or other places), which is inefficient, just like this: Thanks a lot.

qgallouedec commented 2 years ago

There is no native argument that does this. I suggest you constrain the action. You can do this by wrapping the send (with gym.Wrapper). Your wrapper would look like

from gym import Wrapper

class StayFrontWrapper(gym.Wrapper):
    def reset(self):
        obs =  super().reset()
        self._last_obs = obs["observation"]
        return obs

    def step(self, action):
        if not self.is_front():
            action[0] = 0.0 # x displacement
        obs, reward, done, info = super().step(action)
        self._last_obs = obs["observation"]
        return obs, reward, done, info

    def is_front(self):
        if self._last_obs[0] > 0.0: # x ee position
            return True
        else:
            return False

I haven't tested this code, but it gives you an idea of what I would do if I were you.

Feel free to publish the solution you finally choose here, it could help other people.

learningxiaobai commented 2 years ago

Thanks a lot!

learningxiaobai commented 2 years ago

Hello，I just only change this RobotTaskEnv class（add def is_front()，modify the steps by adding ：if not self.is_front(): action[0] = 0.0 # x displacement，adding self._last_obs = obs["observation"] in init）,but don't work,the robot still go back as usual,Do you know Why,thanks! `class RobotTaskEnv(gym.GoalEnv):

metadata = {"render.modes": ["human", "rgb_array"]}

def __init__(self):
    self.seed()  # required for init for can be changer later
    obs = self.reset()
    observation_shape = obs["observation"].shape
    achieved_goal_shape = obs["achieved_goal"].shape
    desired_goal_shape = obs["achieved_goal"].shape
    self._last_obs = obs["observation"]
    self.observation_space = spaces.Dict(
        dict(
            observation=spaces.Box(-np.inf, np.inf, shape=observation_shape),
            desired_goal=spaces.Box(-np.inf, np.inf, shape=achieved_goal_shape),
            achieved_goal=spaces.Box(-np.inf, np.inf, shape=desired_goal_shape),
        )
    )
    self.action_space = self.robot.action_space
    self.compute_reward = self.task.compute_reward
    self.render = self.sim.render

def _get_obs(self):
    robot_obs = self.robot.get_obs()  # robot state
    task_obs = self.task.get_obs()  # object position, velococity, etc...
    observation = np.concatenate([robot_obs, task_obs])

    achieved_goal = self.task.get_achieved_goal()

    return {
        "observation": observation,
        "achieved_goal": achieved_goal,
        "desired_goal": self.task.get_goal(),
    }

def reset(self):
    with self.sim.no_rendering():
        self.robot.reset()
        self.task.reset()
    return self._get_obs()

def step(self, action):
    if not self.is_front():
        action[0] = 0.0 # x displacement
    self.robot.set_action(action)
    self.sim.step()
    obs = self._get_obs()
    done = False
    info = {
        "is_success": self.task.is_success(obs["achieved_goal"], self.task.get_goal()),
    }
    reward = self.task.compute_reward(obs["achieved_goal"], self.task.get_goal(), info)
    return obs, reward, done, info

def seed(self, seed=None):
    """Setup the seed."""
    return self.task.seed(seed)

def close(self):
    self.sim.close()

def is_front(self):
    if self._last_obs[0] > 0: # x ee position
        return True
    else:
        return False`

qgallouedec commented 2 years ago

Have you check whether the is_front method work as expected? You can print something when is_front returns True and compare with what you observe in the rendering.

learningxiaobai commented 2 years ago

Have you check whether the is_front method work as expected? You can print something when is_front returns True and compare with what you observe in the rendering.

thanks ,I will try

learningxiaobai commented 2 years ago

Have you check whether the is_front method work as expected? You can print something when is_front returns True and compare with what you observe in the rendering.

@qgallouedec ,thanks a lot,I put self._last_obs = obs["observation"] in the _get_obs,not in the init.It works. Btw, do you know some useful Rl Library（except RB3(which takes a long time for me to train，may be the environment is too hard )）to train or you write the Rl algorithm yourself?

qgallouedec commented 2 years ago

I use mainly SB3. You can speed up learning by using multiprocessing, but AFIK it is not released yet for off-policy algorithms (soon, maybe ?). Check https://github.com/DLR-RM/stable-baselines3/issues/179#issue-715804276, it can help.

qgallouedec / panda-gym

Restrict the panda robot to only move in the front #14