Question about task observation

kuntoro-adi commented 1 year ago

Hi,

I am curious about the task observations used in the environment. I am very sorry if my question is very trivial, I am new to reinforcement learning. The observation states of pick and place tasks are the objects kinematics (position, velocity, etc):

# position, rotation of the object
object_position = self.sim.get_base_position("object")
object_rotation = self.sim.get_base_rotation("object")
object_velocity = self.sim.get_base_velocity("object")
object_angular_velocity = self.sim.get_base_angular_velocity("object")
observation = np.concatenate([object_position, object_rotation, object_velocity, object_angular_velocity])

Even in the PandaReach, the task observation is empty:

def get_obs(self) -> np.ndarray:
        return np.array([])  # no tasak-specific observation

Why is the target position not included in the observation? Such as:

target_position = self.sim.get_base_position("target")
object_position = self.sim.get_base_position("object")
...
observation = np.concatenate([target_position, object_position, ...])

Does this mean that the critic networks in the RL algorithms (SAC or TQC) basically also learning to predict the random target location? If it is not, for example in pick and place task, does the agent still need to randomly search position with maximum reward after successfully picking the object, when testing the trained model?

Thank you very much.

qgallouedec commented 1 year ago

In fact the actual observation that the agent receives the observation is composed of several components (in the form of dict).

observation = {
  "observation": ...,
  "desired_goal": ..., 
  "achieved_goal": ...,
}

For the key "observation", there are two components:
- the robot-related component (the position of the gripper, its speed etc.)
- the task-related component (the position of the objects, the speed etc.)
For the key "desired_goal", this is the goal of the task. For example, the target position of the object for the PickAndPlace task.
For the key "achieved_goal", it is the goal achieved at time t. For example for PickAndPlace, it is the current position of the object.

Several remarks:

The codes you attached are for building the task-related observation. This component is then concatenated with the component linked to the robot, then put in a dict with the desired goal and the achieved goal. (see above)
"achieved_goal" is usually redundant with the components of the observation. For example, for PickAndPlace, the position of the object is in the observation vector and in the "achieved_goal" vector.
For Reach, there is no component linked to the task because there is no object (in this case, the terminology is a bit misleading because there is still a task to perform, but this is a special case so we left it like that)

qgallouedec commented 1 year ago

All this is explained in the publication linked to panda-gym, I strongly advise you to read it (especially the diagram)

qgallouedec / panda-gym

Question about task observation #61