Closed chance20210722 closed 7 months ago
There are at least two options:
You can optimize the keyframes of the existing Kick_Motion.xml file for every robot type. I've uploaded a gym example that uses PPO to optimize the Get Up behaviors, which are also implemented with keyframes.
The gym is here:
FCPCodebase/scripts/gyms/Get_Up.py
To use this gym you would have to adapt it to train the Kick_Motion instead of the Get_Up behavior. During each training episode, the optimization method written in the provided gym starts by retrieving the values for each keyframe. If the behavior has 5 keyframes, the episode will have 5 steps. Only in the final step is the behavior actually executed and evaluated, generating a single reward for the entire episode.
At the end of the training process, it generates a new XML file with the optimized joint angles per keyframe that you can use to replace the old Kick_Motion.xml of every robot type.
You can also optimize the keyframes using CMA-ES. But to do that you have to further adapt the provided gym.
You can also train a behavior from scratch by creating a new gym. This would be my preferred approach.
As a suggestion, in your new gym, you can use the reset function to walk towards the ball using the internal Walk behavior. When close enough to the ball, exit the reset function. Then, make a step function that returns a reward of zero for the first time steps (e.g. 15 steps), and, at the final step, call self.sync() in a loop to let the simulation unroll and, finally, evaluate the final position of the ball and generate an appropriate reward, depending on your primary objective.
Thank you so much Miguel Abreu @.***> 于 2024年3月14日周四 07:51写道:
There are at least two options:
- Optimize the slot behavior
You can optimize the keyframes of the existing Kick_Motion.xml file for every robot type. I've uploaded a gym example that uses PPO to optimize the Get Up behaviors, which are also implemented with keyframes.
The gym is here: FCPCodebase/scripts/gyms/Get_Up.py
To use this gym you would have to adapt it to train the Kick_Motion instead of the Get_Up behavior. During each training episode, the optimization method written in the provided gym starts by retrieving the values for each keyframe. If the behavior has 5 keyframes, the episode will have 5 steps. Only in the final step is the behavior actually executed and evaluated, generating a single reward for the entire episode.
At the end of the training process, it generates a new XML file with the optimized joint angles per keyframe that you can use to replace the old Kick_Motion.xml of every robot type.
You can also optimize the keyframes using CMA-ES. But to do that you have to further adapt the provided gym.
- Train a behavior from scratch
You can also train a behavior from scratch by creating a new gym. This would be my preferred approach.
As a suggestion, in your new gym, you can use the reset function to walk towards the ball using the internal Walk behavior. When close enough to the ball, exit the reset function. Then, make a step function that returns a reward of zero for the first time steps (e.g. 15 steps), and, at the final step, call self.sync() in a loop to let the simulation unroll and, finally, evaluate the final position of the ball and generate an appropriate reward, depending on your primary objective.
— Reply to this email directly, view it on GitHub https://github.com/m-abr/FCPCodebase/issues/5#issuecomment-1996125371, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBNZXSC6DG6XADCKACZITZ3YYDRAVAVCNFSM6AAAAABEUERVVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWGEZDKMZXGE . You are receiving this because you authored the thread.Message ID: @.***>
If I want to use the get-up gym for training, how should I modify terminal, and currently, the given kickmotion only has two slots. How can I ensure that additional slots can be correctly recognized and trained? During the training process, I found that training can only change the parameters of the joints but does not alter delta. What should I do to make changes to it?
There is one kick motion per robot, and they can be found here:
FCPCodebase/behaviors/slot/r0/Kick_Motion.xml
FCPCodebase/behaviors/slot/r1/Kick_Motion.xml
FCPCodebase/behaviors/slot/r2/Kick_Motion.xml
FCPCodebase/behaviors/slot/r3/Kick_Motion.xml
FCPCodebase/behaviors/slot/r4/Kick_Motion.xml
Suppose you want to train a new kick motion for robot type 0 with 3 slots.
First, duplicate FCPCodebase/behaviors/slot/r0/Kick_Motion.xml
to create a new file:
FCPCodebase/behaviors/slot/r0/Your_Kick_Motion_with_3_slots.xml
Then, manually add a 3rd slot:
<?xml version="1.0" encoding="utf-8"?>
<behavior description="Kick motion with right leg" auto_head="1">
<slot delta="0.22"> <!-- Lean -->
<move id="5" angle="-10"/> <!-- Left leg roll -->
<move id="6" angle="40"/> <!-- Left leg pitch -->
<move id="7" angle="65"/> <!-- Right leg pitch -->
<move id="8" angle="-60"/> <!-- Left knee -->
<move id="9" angle="-115"/> <!-- Right knee -->
<move id="10" angle="60"/> <!-- Left foot pitch -->
<move id="11" angle="10"/> <!-- Right foot pitch -->
</slot>
<slot delta="0.12"> <!-- Kick -->
<move id="3" angle="-45"/> <!-- Right leg yaw/pitch -->
<move id="6" angle="-25"/> <!-- Left leg pitch -->
<move id="7" angle="80"/> <!-- Right leg pitch -->
<move id="8" angle="0"/> <!-- Left knee -->
<move id="9" angle="0"/> <!-- Right knee -->
<move id="10" angle="30"/> <!-- Left foot pitch -->
</slot>
<slot delta="0.12"> <!-- Kick (duplicated) -->
<move id="3" angle="-45"/> <!-- Right leg yaw/pitch -->
<move id="6" angle="-25"/> <!-- Left leg pitch -->
<move id="7" angle="80"/> <!-- Right leg pitch -->
<move id="8" angle="0"/> <!-- Left knee -->
<move id="9" angle="0"/> <!-- Right knee -->
<move id="10" angle="30"/> <!-- Left foot pitch -->
</slot>
</behavior>
Note that I simply duplicated the last slot. Adding a new slot is as simple as this. In every slot you can define the desired angle for every joint you wish to control. If a joint is not mentioned in a slot, then that joint is not moved.
In the gym Get_Up.py
, it currently optimizes 20 joints (from joint 2 to joint 21).
This means that if a certain joint is not specified in the XML, it is assumed to be zero (which is different from not moving!).
To optimize only the joints that are present in the XML you would have to modify Get_Up.py
.
If you prefer not to modify Get_Up.py
, you can manually specify every joint in every slot of the XML file.
Regarding the delta, Get_Up.py
is already optimizing it!
Every time the step function is called, it corresponds to a single slot, for which the new delta is assigned here:
def step(self, action):
#action: 1 delta + 10 joints
r = self.player.world.robot
action = Get_Up.scale_action(action)
delta, indices, angles = self.original_slots[self.current_slot]
angles = Get_Up.get_22_angles(angles, indices)
angles[2:] += action[1:] # exclude head
new_delta = max((delta + action[0])//20*20, 20) <-------------------------------------------
To better understand the code above, I need to explain how the action vector is organized.
The action is composed of 11 values, which corresponds to 1 new delta (which is added to the default delta) and 10 joint values.
There are only 10 joint values because the Get Up behavior is symmetric, so to obtain the full 20 joints, we simply expand the symmetric joints as seen in the scale_action
function:
@staticmethod
def scale_action(action : np.ndarray):
new_action = np.zeros(len(action)*2-1,action.dtype)
new_action[0] = action[0] * 10
new_action[1:] = np.repeat(action[1:] * 3,2) # expand symmetrical actions
return new_action
In the above function we scale actions, multiplying the delta by 10(ms), and the actions by 3. Additionally, we repeat every action twice to control 20 joints, so that the returned new_action
has 21 elements.
Note that to train a kick, you should not duplicate actions as done in scale_action
because you do not want a symmetric behavior.
I hope my explanation was clear :)
I configured the reinforcement learning environment with reference to "FC Portugal Codebase". I optimize the get up behavior through the Run_Utils.py script. But I can't seem to do more than one iteration.The following error will appear every time after completing an iteration,
Traceback (most recent call last):
File "Run_Utils.py", line 93, in
thanks
I think this issue was already addressed in #3. Check if this answer solves your problem.
Hello, we are now training the kicking distance of Type 1 robots to a maximum of 14 meters by modifying the "Get_Up" sample code you provided.But we still have several problems now.
Could you clarify when the robot falls? Does it happen during the kick or after? If it's after the kick, make sure you wait enough time after the kick is executed, during a training episode, to check if it falls. However, falling after a long kick is usually not a problem. If it fails to kick in-game but not while training, there may be a mismatch between the starting conditions in both situations. The approach to the ball should be the same during training and during an actual game.
Hyperparameter selection varies depending on the skill being trained and the specific environment configuration. I recommend experimenting with small variations in hyperparameters to understand how they affect the results. Additionally, if you find that the training ends too early, consider increasing the total_steps
variable.
- Could you clarify when the robot falls? Does it happen during the kick or after? If it's after the kick, make sure you wait enough time after the kick is executed, during a training episode, to check if it falls. However, falling after a long kick is usually not a problem. If it fails to kick in-game but not while training, there may be a mismatch between the starting conditions in both situations. The approach to the ball should be the same during training and during an actual game.
- Hyperparameter selection varies depending on the skill being trained and the specific environment configuration. I recommend experimenting with small variations in hyperparameters to understand how they affect the results. Additionally, if you find that the training ends too early, consider increasing the
total_steps
variable.
We are referring to the fall after performing the kicking motion, we will continue to try hard, thank you for the advice.
One question I have is about how to optimize the kicking motion of the robot. I saw that the robot's kicking action is composed of multiple frames, and each frame has the joint angle of the robot.Can CMA-ES be used to optimize the kicking action?