PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
The problem occurs in an environment where obs_rms is used because the variable self.ob_rms in the VecNormalize class in the envs.py is not the same as self.obs_rms in the (from stable_baselines3.common.vec_env.vecnormalize import **VecNormalize as VecNormalize** ).
The problem occurs in an environment where obs_rms is used because the variable self.ob_rms in the VecNormalize class in the envs.py is not the same as self.obs_rms in the (from stable_baselines3.common.vec_env.vecnormalize import **VecNormalize as VecNormalize** ).