roboterax / humanoid-gym

Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer https://arxiv.org/abs/2404.05695
https://sites.google.com/view/humanoid-gym/
705 stars 117 forks source link

The function compute_ref_state() in humanoid_env.py #8

Closed xzbreeze closed 5 months ago

xzbreeze commented 5 months ago

The function that compute reference state is very hard to understand. Why ref_dof_pos is related to sin of phase? What does the scale1/2 stand for? Why only 2,3,4 and 8,9,10 are specified?

I really appricate your help and answer.

    def compute_ref_state(self):
        phase = self._get_phase()
        sin_pos = torch.sin(2 * torch.pi * phase)
        sin_pos_l = sin_pos.clone()
        sin_pos_r = sin_pos.clone()
        self.ref_dof_pos = torch.zeros_like(self.dof_pos)
        scale_1 = self.cfg.rewards.target_joint_pos_scale
        scale_2 = 2 * scale_1
        # left foot stance phase set to default joint pos
        sin_pos_l[sin_pos_l > 0] = 0
        self.ref_dof_pos[:, 2] = sin_pos_l * scale_1
        self.ref_dof_pos[:, 3] = sin_pos_l * scale_2
        self.ref_dof_pos[:, 4] = sin_pos_l * scale_1
        # right foot stance phase set to default joint pos
        sin_pos_r[sin_pos_r < 0] = 0
        self.ref_dof_pos[:, 8] = sin_pos_r * scale_1
        self.ref_dof_pos[:, 9] = sin_pos_r * scale_2
        self.ref_dof_pos[:, 10] = sin_pos_r * scale_1
        # Double support phase
        self.ref_dof_pos[torch.abs(sin_pos) < 0.1] = 0

        self.ref_action = 2 * self.ref_dof_pos
wangyenjen commented 5 months ago

Hi @xzbreeze,

The specific configuration for the XBot-L joint order can be found at this link. In particular, indices 2 and 8 correspond to the leg_pitch_joint, indices 3 and 9 correspond to the knee_joint, and indices 4 and 10 correspond to the ankle_pitch_joint.

For any remaining issues, you can refer to this paper for further details. :)

Ke-Wang1017 commented 5 months ago

Hi @xzbreeze , I find this paper https://ieeexplore.ieee.org/abstract/document/9561814 mainly talks about parameterizing different periodic gaits and uses it as a reward for bipedal locomotion. Hope it helps :)