Why might my rewards be inversely proportional to the target height in the HoverAviary environment?

utiasDSL / gym-pybullet-drones

PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control

https://utiasDSL.github.io/gym-pybullet-drones/

MIT License

1.15k stars 337 forks source link

Why might my rewards be inversely proportional to the target height in the HoverAviary environment? #214

Closed gulbinkirikoglu closed 1 month ago

gulbinkirikoglu commented 1 month ago

I'm attempting to conduct discrete-time RL training using the HoverAviary environment. My goal is to take the z position and z velocity within the observation space as input and control the drone's up and down movement (just [-1 -1 -1 -1] and [1 1 1 1] arrays). I'm using a defined reward function, but as the drone falls below a height of 1, the reward increases. What could be the reason for this?

piratax007 commented 1 month ago

How have you defined the reward function? how does the code of the _compute_reward method?

gulbinkirikoglu commented 1 month ago

Thank you for your help, I found the error. The issue wasn't with the reward function. I was treating the actions as discrete values, converting them to arrays, and storing them in the experience as arrays. Instead, I stored the discrete actions directly, and the problem was resolved.