About Basic_Run Training

Aike2071 commented 8 months ago

I tried to train the Walk behavior using Basic_Run.py in the codebase,but the pkl file outputs an array in the size of 22 instead of 16 like that in the original Walk.Do the 22 action indexes match the joint index of the robot?

m-abr commented 8 months ago

The walk skill that you can find at /behaviors/custom/Walk/ is a bit different from the skill that is learned through /scripts/gyms/Basic_Run.py.

The Walk has a neural network that controls 16 actions:

[0,1,2]: position of left ankle in 3 dimensions (x,y,z), relative to the center of both hip joints
[3,4,5]: position of right ankle in 3 dimensions (x,y,z), relative to the center of both hip joints
[6,7,8]: 3D orientation of left foot, rotation around (x,y,z)
[9,10,11]: 3D orientation of right foot, rotation around (x,y,z)
[12,13]: left/right arm pitch
[14,15]: left/right arm roll

These actions are added to a step trajectory generator with fixed parameters (step duration: 8, step vertical span: 0.02 m, step z max: 70%). The final action is then converted into joint positions through inverse kinematics.

In Basic_Run.py, the neural network controls 22 actions:

[0-19]: joint positions corresponding to robot joints [2-21]
[20]: step vertical span
[21]: step z max

The actions are then processed as follows:

self.player.behavior.execute("Step", ...) is called to generate the next values for the step trajectory. It uses inverse kinematics internally to compute joint positions. In the first time step of each episode, the step vertical span and the step z max indicated by the neural network are used to configure the step trajectory generator. After that, the last two values produced by the neural network are ignored.
A new vector of scaled joint positions called new_action is created from the neural network's output: new_action = self.act[:20] * 2 # scale up actions to motivate exploration

The joint positions computed in the 1st step are extracted from self.step_obj and added to new_action:

new_action[[0,2,4,6,8,10]] += self.step_obj.values_l
new_action[[1,3,5,7,9,11]] += self.step_obj.values_r

Some biases are added to control the initial position of the robot:

new_action[12] -= 90 # arms down
new_action[13] -= 90 # arms down
new_action[16] += 90 # untwist arms
new_action[17] += 90 # untwist arms
new_action[18] += 90 # elbows at 90 deg
new_action[19] += 90 # elbows at 90 deg

new_action is assigned to robot joints [2-21]:

r.set_joints_target_position_direct( # commit actions:
slice(2,22),        # act on all joints except head & toes (for robot type 4)
new_action,         # target joint positions 
harmonize=False     # there is no point in harmonizing actions if the targets change at every step  
)

Note that these 2 approaches are just examples of what can be accomplished. You can modify these control methods to better suit your needs.

Aike2071 commented 8 months ago

Thank you so much!!

m-abr / FCPCodebase

About Basic_Run Training #13