utiasDSL / gym-pybullet-drones

PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
https://utiasDSL.github.io/gym-pybullet-drones/
MIT License
1.27k stars 375 forks source link

Learn to fly to the target point #106

Open zhengtiantian opened 2 years ago

zhengtiantian commented 2 years ago

Hi: First of all thank you for creating such a great work.

The action type I used is vel, and I used A2C training, it can fly in the right direction, but it continues to fly after reaching the end, how can I make it stop? I set a bonus value of -1 on both the borders

JacopoPan commented 2 years ago

Hi @zhengtiantian , thanks!

The overall behaviour of the policy that an RL agent will learn depends on what is optimal for the underlying MDP (i.e., the environment/gym/aviary class). If the environment has termination conditions on the boundary and negative rewards it is unlikely that an agent will learn how to stop because it can simply "hack the reward" by reaching a high reward/low negative reward point in the state space and then try to terminate early by leaving the arena.

Depending on which type of behaviour you are trying to learn, you should carefully choose reward and done signals.