m-abr / FCPCodebase

FC Portugal Codebase
GNU General Public License v3.0
35 stars 7 forks source link

How to optimize your kicking motion #5

Closed chance20210722 closed 7 months ago

chance20210722 commented 8 months ago

One question I have is about how to optimize the kicking motion of the robot. I saw that the robot's kicking action is composed of multiple frames, and each frame has the joint angle of the robot.Can CMA-ES be used to optimize the kicking action?

m-abr commented 8 months ago

There are at least two options:

1. Optimize the slot behavior

You can optimize the keyframes of the existing Kick_Motion.xml file for every robot type. I've uploaded a gym example that uses PPO to optimize the Get Up behaviors, which are also implemented with keyframes.

The gym is here: FCPCodebase/scripts/gyms/Get_Up.py

To use this gym you would have to adapt it to train the Kick_Motion instead of the Get_Up behavior. During each training episode, the optimization method written in the provided gym starts by retrieving the values for each keyframe. If the behavior has 5 keyframes, the episode will have 5 steps. Only in the final step is the behavior actually executed and evaluated, generating a single reward for the entire episode.

At the end of the training process, it generates a new XML file with the optimized joint angles per keyframe that you can use to replace the old Kick_Motion.xml of every robot type.

You can also optimize the keyframes using CMA-ES. But to do that you have to further adapt the provided gym.

2. Train a behavior from scratch

You can also train a behavior from scratch by creating a new gym. This would be my preferred approach.

As a suggestion, in your new gym, you can use the reset function to walk towards the ball using the internal Walk behavior. When close enough to the ball, exit the reset function. Then, make a step function that returns a reward of zero for the first time steps (e.g. 15 steps), and, at the final step, call self.sync() in a loop to let the simulation unroll and, finally, evaluate the final position of the ball and generate an appropriate reward, depending on your primary objective.

chance20210722 commented 8 months ago

Thank you so much Miguel Abreu @.***> 于 2024年3月14日周四 07:51写道:

There are at least two options:

  1. Optimize the slot behavior

You can optimize the keyframes of the existing Kick_Motion.xml file for every robot type. I've uploaded a gym example that uses PPO to optimize the Get Up behaviors, which are also implemented with keyframes.

The gym is here: FCPCodebase/scripts/gyms/Get_Up.py

To use this gym you would have to adapt it to train the Kick_Motion instead of the Get_Up behavior. During each training episode, the optimization method written in the provided gym starts by retrieving the values for each keyframe. If the behavior has 5 keyframes, the episode will have 5 steps. Only in the final step is the behavior actually executed and evaluated, generating a single reward for the entire episode.

At the end of the training process, it generates a new XML file with the optimized joint angles per keyframe that you can use to replace the old Kick_Motion.xml of every robot type.

You can also optimize the keyframes using CMA-ES. But to do that you have to further adapt the provided gym.

  1. Train a behavior from scratch

You can also train a behavior from scratch by creating a new gym. This would be my preferred approach.

As a suggestion, in your new gym, you can use the reset function to walk towards the ball using the internal Walk behavior. When close enough to the ball, exit the reset function. Then, make a step function that returns a reward of zero for the first time steps (e.g. 15 steps), and, at the final step, call self.sync() in a loop to let the simulation unroll and, finally, evaluate the final position of the ball and generate an appropriate reward, depending on your primary objective.

— Reply to this email directly, view it on GitHub https://github.com/m-abr/FCPCodebase/issues/5#issuecomment-1996125371, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBNZXSC6DG6XADCKACZITZ3YYDRAVAVCNFSM6AAAAABEUERVVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWGEZDKMZXGE . You are receiving this because you authored the thread.Message ID: @.***>

Dionysus7777777 commented 8 months ago

If I want to use the get-up gym for training, how should I modify terminal, and currently, the given kickmotion only has two slots. How can I ensure that additional slots can be correctly recognized and trained? During the training process, I found that training can only change the parameters of the joints but does not alter delta. What should I do to make changes to it?

m-abr commented 8 months ago

There is one kick motion per robot, and they can be found here:

FCPCodebase/behaviors/slot/r0/Kick_Motion.xml
FCPCodebase/behaviors/slot/r1/Kick_Motion.xml
FCPCodebase/behaviors/slot/r2/Kick_Motion.xml
FCPCodebase/behaviors/slot/r3/Kick_Motion.xml
FCPCodebase/behaviors/slot/r4/Kick_Motion.xml

Suppose you want to train a new kick motion for robot type 0 with 3 slots. First, duplicate FCPCodebase/behaviors/slot/r0/Kick_Motion.xml to create a new file:

FCPCodebase/behaviors/slot/r0/Your_Kick_Motion_with_3_slots.xml

Then, manually add a 3rd slot:

<?xml version="1.0" encoding="utf-8"?>

<behavior description="Kick motion with right leg" auto_head="1">
    <slot delta="0.22">    <!-- Lean -->
        <move id="5" angle="-10"/>  <!-- Left  leg roll -->
        <move id="6" angle="40"/>   <!-- Left  leg pitch -->
        <move id="7" angle="65"/>   <!-- Right leg pitch -->
        <move id="8" angle="-60"/>  <!-- Left  knee -->
        <move id="9" angle="-115"/> <!-- Right knee -->
        <move id="10" angle="60"/>  <!-- Left  foot pitch -->
        <move id="11" angle="10"/>  <!-- Right foot pitch -->
    </slot>
    <slot delta="0.12">    <!-- Kick -->
        <move id="3" angle="-45"/> <!-- Right leg yaw/pitch -->
        <move id="6" angle="-25"/> <!-- Left  leg pitch -->
        <move id="7" angle="80"/>  <!-- Right leg pitch -->
        <move id="8" angle="0"/>   <!-- Left knee -->
        <move id="9" angle="0"/>   <!-- Right knee -->
        <move id="10" angle="30"/> <!-- Left  foot pitch -->
    </slot>
    <slot delta="0.12">    <!-- Kick (duplicated) -->
        <move id="3" angle="-45"/> <!-- Right leg yaw/pitch -->
        <move id="6" angle="-25"/> <!-- Left  leg pitch -->
        <move id="7" angle="80"/>  <!-- Right leg pitch -->
        <move id="8" angle="0"/>   <!-- Left knee -->
        <move id="9" angle="0"/>   <!-- Right knee -->
        <move id="10" angle="30"/> <!-- Left  foot pitch -->
    </slot>
</behavior>

Note that I simply duplicated the last slot. Adding a new slot is as simple as this. In every slot you can define the desired angle for every joint you wish to control. If a joint is not mentioned in a slot, then that joint is not moved.

In the gym Get_Up.py, it currently optimizes 20 joints (from joint 2 to joint 21). This means that if a certain joint is not specified in the XML, it is assumed to be zero (which is different from not moving!). To optimize only the joints that are present in the XML you would have to modify Get_Up.py. If you prefer not to modify Get_Up.py, you can manually specify every joint in every slot of the XML file.

Regarding the delta, Get_Up.py is already optimizing it! Every time the step function is called, it corresponds to a single slot, for which the new delta is assigned here:

def step(self, action):
        #action: 1 delta + 10 joints
        r = self.player.world.robot
        action = Get_Up.scale_action(action)

        delta, indices, angles = self.original_slots[self.current_slot]
        angles = Get_Up.get_22_angles(angles, indices)

        angles[2:] += action[1:] # exclude head
        new_delta = max((delta + action[0])//20*20, 20)      <-------------------------------------------

To better understand the code above, I need to explain how the action vector is organized. The action is composed of 11 values, which corresponds to 1 new delta (which is added to the default delta) and 10 joint values. There are only 10 joint values because the Get Up behavior is symmetric, so to obtain the full 20 joints, we simply expand the symmetric joints as seen in the scale_action function:

    @staticmethod
    def scale_action(action : np.ndarray):
        new_action = np.zeros(len(action)*2-1,action.dtype) 
        new_action[0]  = action[0] * 10
        new_action[1:] = np.repeat(action[1:] * 3,2) # expand symmetrical actions

        return new_action

In the above function we scale actions, multiplying the delta by 10(ms), and the actions by 3. Additionally, we repeat every action twice to control 20 joints, so that the returned new_action has 21 elements.

Note that to train a kick, you should not duplicate actions as done in scale_action because you do not want a symmetric behavior.

I hope my explanation was clear :)

chance20210722 commented 8 months ago

I configured the reinforcement learning environment with reference to "FC Portugal Codebase". I optimize the get up behavior through the Run_Utils.py script. But I can't seem to do more than one iteration.The following error will appear every time after completing an iteration,

截图 2024-03-30 14-50-49

Traceback (most recent call last): File "Run_Utils.py", line 93, in main() File "Run_Utils.py", line 81, in main mod.Train(script).train(dict()) File "/home/h/桌面/FCP/FCPCodebase/scripts/gyms/Get_Up.py", line 195, in train model_path = self.learn_model( model, total_steps, model_path, eval_env=eval_env, eval_freq=n_steps_per_env*10, backup_env_file=file ) File "/home/h/桌面/FCP/FCPCodebase/scripts/commons/Train_Base.py", line 279, in learn_model model.save( os.path.join(path, "last_model") ) File "/home/h/桌面/FCP/stable-baselines3/stable_baselines3/common/base_class.py", line 837, in save save_to_zip_file(path, data=data, params=params_to_save, pytorch_variables=pytorch_variables) File "/home/h/桌面/FCP/stable-baselines3/stable_baselines3/common/save_util.py", line 309, in save_to_zip_file serialized_data = data_to_json(data) File "/home/h/桌面/FCP/stable-baselines3/stable_baselines3/common/save_util.py", line 99, in data_to_json base64_encoded = base64.b64encode(cloudpickle.dumps(data_item)).decode() File "/home/h/桌面/FCP/FCPCodebase/kick_3.8/lib/python3.8/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps cp.dump(obj) File "/home/h/桌面/FCP/FCPCodebase/kick_3.8/lib/python3.8/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump return super().dump(obj) File "/usr/lib/python3.8/multiprocessing/process.py", line 347, in reduce raise TypeError( TypeError: Pickling an AuthenticationString object is disallowed for security reasons

Dionysus7777777 commented 8 months ago

thanks

m-abr commented 8 months ago

I think this issue was already addressed in #3. Check if this answer solves your problem.

chance20210722 commented 7 months ago

Hello, we are now training the kicking distance of Type 1 robots to a maximum of 14 meters by modifying the "Get_Up" sample code you provided.But we still have several problems now.

  1. Our robot always falls down after playing football. We have used falling as a punishment, but it has no effect.
  2. We are not very clear about the setting of hyperparameters. Can you provide some suggestions?
m-abr commented 7 months ago
  1. Could you clarify when the robot falls? Does it happen during the kick or after? If it's after the kick, make sure you wait enough time after the kick is executed, during a training episode, to check if it falls. However, falling after a long kick is usually not a problem. If it fails to kick in-game but not while training, there may be a mismatch between the starting conditions in both situations. The approach to the ball should be the same during training and during an actual game.

  2. Hyperparameter selection varies depending on the skill being trained and the specific environment configuration. I recommend experimenting with small variations in hyperparameters to understand how they affect the results. Additionally, if you find that the training ends too early, consider increasing the total_steps variable.

chance20210722 commented 7 months ago
  1. Could you clarify when the robot falls? Does it happen during the kick or after? If it's after the kick, make sure you wait enough time after the kick is executed, during a training episode, to check if it falls. However, falling after a long kick is usually not a problem. If it fails to kick in-game but not while training, there may be a mismatch between the starting conditions in both situations. The approach to the ball should be the same during training and during an actual game.
  2. Hyperparameter selection varies depending on the skill being trained and the specific environment configuration. I recommend experimenting with small variations in hyperparameters to understand how they affect the results. Additionally, if you find that the training ends too early, consider increasing the total_steps variable.

We are referring to the fall after performing the kicking motion, we will continue to try hard, thank you for the advice.