Closed kuntoro-adi closed 1 year ago
In fact the actual observation that the agent receives the observation is composed of several components (in the form of dict).
observation = {
"observation": ...,
"desired_goal": ...,
"achieved_goal": ...,
}
"observation"
, there are two components:
"desired_goal"
, this is the goal of the task. For example, the target position of the object for the PickAndPlace task."achieved_goal"
, it is the goal achieved at time t. For example for PickAndPlace, it is the current position of the object.Several remarks:
"achieved_goal"
is usually redundant with the components of the observation. For example, for PickAndPlace, the position of the object is in the observation vector and in the "achieved_goal"
vector.All this is explained in the publication linked to panda-gym, I strongly advise you to read it (especially the diagram)
Hi,
I am curious about the task observations used in the environment. I am very sorry if my question is very trivial, I am new to reinforcement learning. The observation states of pick and place tasks are the objects kinematics (position, velocity, etc):
Even in the PandaReach, the task observation is empty:
Why is the target position not included in the observation? Such as:
Does this mean that the critic networks in the RL algorithms (SAC or TQC) basically also learning to predict the random target location? If it is not, for example in pick and place task, does the agent still need to randomly search position with maximum reward after successfully picking the object, when testing the trained model?
Thank you very much.