utiasDSL / gym-pybullet-drones

PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
https://utiasDSL.github.io/gym-pybullet-drones/
MIT License
1.24k stars 363 forks source link

How to normalize observation/action space and rewards correctly? #132

Open JaninaMattes opened 1 year ago

JaninaMattes commented 1 year ago

Hello everyone,

first of all, thank you for the opportunity to use this project. I have written my own PPO algorithm and would like to test the takeoff--aviary-v0. I have modified the learn.py script under the ./examples folder. In a first test run the values seemed to be normalised per default due to _clipAndNormalizeState(self, state), call in the Aviary class showing a promising learning result, but in a repeated run the values were no longer normalised and I could not yet figure out how to properly normalise the observation/action space and rewards.

Could the lack of normalisation be a result of an incorrect registration of the custom gym environment?

I followed the instructions and run pip3 install -e . to register the environments. The pybullet drone environment is then created via:

env = gym.make(env_id) env.seed(seed) env.action_space.seed(seed) env.observation_space.seed(seed)

I pass the environment as an argument to my PPOTrainer:

`
trainer = ppo.PPOTrainer( env, total_training_steps=1_000_000)

train PPO

    agent = trainer.create_ppo()
    agent.learn()`

I would be very grateful for any advice!

JacopoPan commented 1 year ago

Hi @JaninaMattes

thank you for reporting! I am not sure why this would be the case

but in a repeated run the values were no longer normalised Can you show me what were you running to raise this problem/how did you come to the conclusion?

I'd be available to organize a Zoom call if creating a bug report with example is too laborious.