Hello everyone,

first of all, thank you for the opportunity to use this project. I have written my own PPO algorithm and would like to test the takeoff--aviary-v0. I have modified the learn.py script under the ./examples folder. In a first test run the values seemed to be normalised per default due to _clipAndNormalizeState(self, state), call in the Aviary class showing a promising learning result, but in a repeated run the values were no longer normalised and I could not yet figure out how to properly normalise the observation/action space and rewards.

Could the lack of normalisation be a result of an incorrect registration of the custom gym environment?

I followed the instructions and run pip3 install -e . to register the environments. The pybullet drone environment is then created via:

env = gym.make(env_id) env.seed(seed) env.action_space.seed(seed) env.observation_space.seed(seed)

I pass the environment as an argument to my PPOTrainer:

`
trainer = ppo.PPOTrainer( env, total_training_steps=1_000_000)

train PPO

    agent = trainer.create_ppo()
    agent.learn()`

I would be very grateful for any advice!

utiasDSL / gym-pybullet-drones

How to normalize observation/action space and rewards correctly? #132

train PPO