Problem in training hover control policy with different quad_params

spencerfolk / rotorpy

A multirotor simulator with aerodynamics for education and research.

MIT License

107 stars 30 forks source link

Problem in training hover control policy with different quad_params #10

Closed atahirkarasahin closed 2 weeks ago

atahirkarasahin commented 2 weeks ago

First, thank you for your develop to this simulation. While using the ppo_hover_train.py code, I found that if hummingbird_params is used instead of crazyflie_params, it cannot learn the hover task. I continued the training from 1M to 10M.

I don't understand how changing only the physical parameters of the drone can affect the result so much. What would you recommend to change or update at this point?

spencerfolk commented 2 weeks ago

Interesting! I'm not exactly sure why this is the case... what control abstraction are you using? What's the simulation rate for the environment? What reward function are you using?

Does the policy learn any behavior at all after 10M steps? If so, what is that behavior?

Unfortunately it's hard to debug why it isn't working with limited information. You might have to play around with different hyperparameters of the learning algorithm to get it to work.

atahirkarasahin commented 2 weeks ago

Thank you for your answer. I used configuration as follow:

reward_function = lambda obs, act: hover_reward(obs, act, weights={'x': 1, 'v': 0.1, 'w': 0, 'u': 1e-5})

env = gym.make("Quadrotor-v0", control_mode ='cmd_motor_speeds', reward_fn = reward_function, quad_params = quad_params, max_time = 5, world = None, sim_rate = 100, render_mode='None')

After 10M steps agent not learning hover task. According to ppo_hover_eval.py, agents not reach to final position. Agents attitude/result is shown this figure:

position_vs_time

And then, I changing both the control abstraction and its hyperparameters of the learning algorithm in the simulation settings, but the results did not change.

spencerfolk commented 2 weeks ago

You could try changing the reward function weights too.

Also, for higher abstractions (like cmd_ctatt for instance), the multirotor environment has lower level controllers to track those references. You can change the gains for those lower level controllers here. This shouldn't be the problem though for cmd_motor_speeds.

atahirkarasahin commented 2 weeks ago

I changed both reward function weights and low level controller parameter and additionally used higher abstractions such as cmd_ctbr.

reward_function = lambda obs, act: hover_reward(obs, act, weights={'x': 1, 'v': 0.1, 'w': 0.1, 'u': 0})
env = gym.make("Quadrotor-v0", control_mode ='cmd_ctbr', reward_fn = reward_function, quad_params = quad_params, max_time = 5, world = None, sim_rate = 100, render_mode='None')
self.k_w = 2 updated to this line. According to these update results obtained is shown as follow:

position_vs_time

As a result, we can consider the hover task of the RL agent as learned according to the hummingbird parameters, although a little bit position error remains. Thanks.

spencerfolk commented 2 weeks ago

Looks good, nice job! Thanks for sharing.