openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.49k stars 8.59k forks source link

In mujoco Hopper-v2 Environment what are the observations' names and actions' names and what they represent #2173

Closed hellfireworld closed 2 years ago

hellfireworld commented 3 years ago

In mujoco such as Hopper-v2 what are the observations' names and actions' names . (If you can help I would like to read them sorted as you get them from the environment). For example If I want to train a neural network with the features when I save them with pandas in a csv file, I would like to know the real names of the features/observations' columns and actions because I assigned them like var1,var2...var11, action1, action2, action3. So I can understand better the environment. If you have any documentation or wiki that I can find these it would be appreciated because the only thing I found was this issue https://github.com/openai/gym/issues/971 and it describes... how can I see (only) the values when they pass them after concatenation.

PS: Maybe they have something to do with q pos( x,y,z) and q vel

Thank you.

lishanwu135 commented 3 years ago

Good question! I am also interested in this question. I would like to know the physical meaning of each variable. Hope someone will answer this question soon.

WillDudley commented 2 years ago

Although Issue #971 answers the question simply, I'll try to explain how to infer what they mean, without relying too heavily on trial-and-error.

TL;DR: "At runtime the positions and orientations of all joints defined in the model are stored in the vector mjData.qpos, in the order in which the appear in the kinematic tree. The linear and angular velocities are stored in the vector mjData.qvel. These two vectors have different dimensionality when free or ball joints are used, because such joints represent rotations as unit quaternions." - source. Hence, qpos[0] relates to the angular position of the first joint defined in the hopper.mjcf file. Actions relate to actuators as mentioned in issue #971.

See here for an official "hint" of what qpos and qvel mean. It says that they are the [angular] positions of the joint positions and velocities, respectively.

Actions

Code inspection shows us that calling step() invokes MjSim.step(), where the docs are here for the Python binding and here for the raw C binding. We can see that this uses the MjData struct, and more specifically it uses the control attributes listed here*. We see that this expects nu elements, which correspond to the number of actuators. These are the actuators here, in their respective order.

*This explains setting MjSim.data.ctrl as part of the step.

Observations

Observations are retrieved here. As mentioned in the tl;dr, they correspond to the joints defined in the xml file. Hence, this check for termination makes sense, as it's getting the angles of the slide joints defined here.

Naming actions and observations

Now that we understand what qpos and qvel return in what order, we can name the state and actions more appropriately, eg. using the same names as the hopper.xml file does. Hopefully this gives you enough of an idea to be able to start naming states and actions from MuJoCo or Bullet environments appropriately, however do always validate your names by eg. printing out states alongside rendering.

jkterry1 commented 2 years ago

PR #2762 is about to be merged, introducing V4 MuJoCo environments using new bindings and a dramatically newer version of the engine. If this issue still persists with the V4 ones, please create a new issue for it.