qgallouedec / panda-gym

Set of robotic environments based on PyBullet physics engine and gymnasium.
MIT License
548 stars 115 forks source link

Question about task observation #61

Closed kuntoro-adi closed 1 year ago

kuntoro-adi commented 1 year ago

Hi,

I am curious about the task observations used in the environment. I am very sorry if my question is very trivial, I am new to reinforcement learning. The observation states of pick and place tasks are the objects kinematics (position, velocity, etc):

# position, rotation of the object
object_position = self.sim.get_base_position("object")
object_rotation = self.sim.get_base_rotation("object")
object_velocity = self.sim.get_base_velocity("object")
object_angular_velocity = self.sim.get_base_angular_velocity("object")
observation = np.concatenate([object_position, object_rotation, object_velocity, object_angular_velocity])

Even in the PandaReach, the task observation is empty:

def get_obs(self) -> np.ndarray:
        return np.array([])  # no tasak-specific observation

Why is the target position not included in the observation? Such as:

target_position = self.sim.get_base_position("target")
object_position = self.sim.get_base_position("object")
...
observation = np.concatenate([target_position, object_position, ...])

Does this mean that the critic networks in the RL algorithms (SAC or TQC) basically also learning to predict the random target location? If it is not, for example in pick and place task, does the agent still need to randomly search position with maximum reward after successfully picking the object, when testing the trained model?

Thank you very much.

qgallouedec commented 1 year ago

In fact the actual observation that the agent receives the observation is composed of several components (in the form of dict).

observation = {
  "observation": ...,
  "desired_goal": ..., 
  "achieved_goal": ...,
}

Several remarks:

qgallouedec commented 1 year ago

All this is explained in the publication linked to panda-gym, I strongly advise you to read it (especially the diagram)