Closed gulbinkirikoglu closed 1 month ago
How have you defined the reward function? how does the code of the _compute_reward
method?
Thank you for your help, I found the error. The issue wasn't with the reward function. I was treating the actions as discrete values, converting them to arrays, and storing them in the experience as arrays. Instead, I stored the discrete actions directly, and the problem was resolved.
I'm attempting to conduct discrete-time RL training using the HoverAviary environment. My goal is to take the z position and z velocity within the observation space as input and control the drone's up and down movement (just [-1 -1 -1 -1] and [1 1 1 1] arrays). I'm using a defined reward function, but as the drone falls below a height of 1, the reward increases. What could be the reason for this?