Learn to fly to the target point

utiasDSL / gym-pybullet-drones

PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control

MIT License

1.27k stars 375 forks source link

Hi @zhengtiantian , thanks!

The overall behaviour of the policy that an RL agent will learn depends on what is optimal for the underlying MDP (i.e., the environment/gym/aviary class). If the environment has termination conditions on the boundary and negative rewards it is unlikely that an agent will learn how to stop because it can simply "hack the reward" by reaching a high reward/low negative reward point in the state space and then try to terminate early by leaving the arena.

Depending on which type of behaviour you are trying to learn, you should carefully choose reward and done signals.

utiasDSL / gym-pybullet-drones

Learn to fly to the target point #106