qgallouedec / panda-gym

Set of robotic environments based on PyBullet physics engine and gymnasium.
MIT License
567 stars 118 forks source link

Clarification about 'observation', 'achieved_goal' and 'desired_goal' #87

Closed wilhem closed 5 months ago

wilhem commented 7 months ago

I'm trying to implement a simulation environment using Panda-Gym, but I couldn't find enough information. I'm still confused about the difference among the following: observation["observation"], _observation["achievedgoal"] and _observation["desiredgoal"] . Initially I thought, that observation["observation"] contains the joint values of the pandas. But then I saw this example here, where the current_position takes the first 3 elements of the observation["observation"]. That's means, that the observation["observation"][0:3] contains the x, y, z position of the grip. Is it right? Then I have the question about the _observation["achievedgoal"]. What is this exactly _achievedgoal ? Does it contain the last achieved goal and gets updated once a new goal has been reached? Or it is the x, y, z position of the grip?

Many thanks

qgallouedec commented 7 months ago

Hi,

check https://github.com/qgallouedec/panda-gym/issues/61#issuecomment-1492878615 and https://github.com/qgallouedec/panda-gym/issues/8#issuecomment-911512499

Ping if it doesn't answer your question :)

wilhem commented 7 months ago

Hi, thank you very much for your answer. I found the following paper (linked by you) very interesting. But there is a point, which makes me unsure: you wrote the following:

For Reach, there is no component linked to the task because there is no object (in this case, the terminology is a bit misleading because there is still a task to perform, but this is a special case so we left it like that)

How should I understand that statement? Is the desired_goal array NOT containing the position of the object to be reached when using the Reach environment?

By the way: is the list of observable features still valid? Or it has changed? The list is vor -v1 and now we have -v3

qgallouedec commented 7 months ago

How should I understand that statement? Is the desired_goal array NOT containing the position of the object to be reached when using the Reach environment?

This is a design-related remark. I wanted to dissociate the task from the robot. Thus, the criterion for achieving the task should not depend on the robot's state. For example, if the task is to push an object to a target position, the only thing that matters is whether the object's position matches the desired position. For Reach, this is special because the task involves the robot's state, but it's meant to be a special case. For observation, desired_goal is the gripper's target position (to be precise, it's a position, not an object to be reached), achieved_goal, the gripper's position.

By the way: is the list of observable features still valid? Or it has changed? The list is vor -v1 and now we have -v3

Yes, still valid

wilhem commented 7 months ago

How should I understand that statement? Is the desired_goal array NOT containing the position of the object to be reached when using the Reach environment? For Reach, this is special because the task involves the robot's state, but it's meant to be a special case. For observation, desired_goal is the gripper's target position (to be precise, it's a position, not an object to be reached), achieved_goal, the gripper's position.

Sorry again. But this last statement did confuse me completely. I'm using the Reach task. In that case the gripper should reach a green object on the working table. A reward of 0.0 is granted when the difference between gripper and object less than 5 cm is. Among 'observation', 'achieved_goal', 'desired_goal', which one:

  1. is the position of the green object?
  2. is the position of the gripper?
  3. are the first 3 values of 'observation' the actual position of the gripper?
qgallouedec commented 7 months ago

Sorry again. But this last statement did confuse me completely.

Don't worry, if you're asking the question, it's not clear enough.

I'm using the Reach task. In that case the gripper should reach a green object on the working table

Yes, two details though,

A reward of 0.0 is granted when the difference between gripper and object less than 5 cm is.

True

which one:

  1. is the position of the green object?

desired_goal

is the position of the gripper?

achieved_goal

are the first 3 values of 'observation' the actual position of the gripper?

Yes