Closed atahirkarasahin closed 2 weeks ago
Interesting! I'm not exactly sure why this is the case... what control abstraction are you using? What's the simulation rate for the environment? What reward function are you using?
Does the policy learn any behavior at all after 10M steps? If so, what is that behavior?
Unfortunately it's hard to debug why it isn't working with limited information. You might have to play around with different hyperparameters of the learning algorithm to get it to work.
Thank you for your answer. I used configuration as follow:
reward_function = lambda obs, act: hover_reward(obs, act, weights={'x': 1, 'v': 0.1, 'w': 0, 'u': 1e-5})
env = gym.make("Quadrotor-v0", control_mode ='cmd_motor_speeds', reward_fn = reward_function, quad_params = quad_params, max_time = 5, world = None, sim_rate = 100, render_mode='None')
After 10M steps agent not learning hover task. According to ppo_hover_eval.py, agents not reach to final position. Agents attitude/result is shown this figure:
And then, I changing both the control abstraction and its hyperparameters of the learning algorithm in the simulation settings, but the results did not change.
You could try changing the reward function weights too.
Also, for higher abstractions (like cmd_ctatt
for instance), the multirotor environment has lower level controllers to track those references. You can change the gains for those lower level controllers here. This shouldn't be the problem though for cmd_motor_speeds
.
I changed both reward function weights and low level controller parameter and additionally used higher abstractions such as cmd_ctbr.
As a result, we can consider the hover task of the RL agent as learned according to the hummingbird parameters, although a little bit position error remains. Thanks.
Looks good, nice job! Thanks for sharing.
First, thank you for your develop to this simulation. While using the ppo_hover_train.py code, I found that if hummingbird_params is used instead of crazyflie_params, it cannot learn the hover task. I continued the training from 1M to 10M.
I don't understand how changing only the physical parameters of the drone can affect the result so much. What would you recommend to change or update at this point?