How does policy handle privileged observations in real-world?

roboterax / humanoid-gym

Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer https://arxiv.org/abs/2404.05695

https://sites.google.com/view/humanoid-gym/

705 stars 117 forks source link

How does policy handle privileged observations in real-world? #12

Closed Sjey-Lyn closed 5 months ago

Sjey-Lyn commented 5 months ago

Dear author.

Thank you for your awesome work, How does policy handle privileged observations in real-world? and Why do the privileged observations have 73 dimensions in Table 3 but a different number in Table 1 in the paper?What does privileged observation involve?

wangyenjen commented 5 months ago

Hi @Sjey-Lyn ,

Thank you for your careful proofreading of our paper! The privileged observation is not directly utilized during deployment; it is only used in the training stage for the value function (critic) in reinforcement learning. The privileged observations correspond to the state described in Table I, and therefore, the dimensions are the same as those listed in Table III. Feel free to ask any further questions. : )

Sjey-Lyn commented 5 months ago

Thank you for your answer. Are privileged observations in the red box? And when I sum it up, it doesn't equal 73. There is another question is migrating to mujoco just for testing? Is there any further tuning?

wangyenjen commented 5 months ago

Hi @Sjey-Lyn ,

The State (privileged observations) comprises both the Observation and the red box, resulting in a total dimension of 47 + 26 = 73. Regarding the migration to MuJoCo, it serves as a validation set, used to validate performance in an environment with different dynamics. Therefore, it should not be used for training or tuning. Hope my answer has addressed these questions. : )