Describe the bug
Lets say you train PPO2 with a vectorized environment of length 2, then you save and load the model and train with a vectorized environment of size 1. A crash occurs.
Code example
env = make_vec_env('CartPole-v1',n_envs=2)
model = PPO2('MlpPolicy', env, n_steps = 10, nminibatches=1)
model.learn(total_timesteps = 100)
model.save("ppo2_cartpole")
del model
model = PPO2.load("ppo2_cartpole")
test_env = DummyVecEnv([lambda: gym.make('CartPole-v1')])
model.set_env(test_env)
model.learn(total_timesteps = 100)
Error
slices = (arr[mbinds] for arr in (obs, returns, masks, actions, values, neglogpacs))
E IndexError: index 13 is out of bounds for axis 0 with size 10
stable_baselines/ppo2/ppo2.py:362: IndexError
System Infodevelopment build from source(this error should be easy to reproduce with any build)python3.7
Additional context
I have prepped a pull request with the patch and corresponding test. The test will fail when the patch is not applied. It may come down to opinion whether this is an error or not, but I think it is, since n_batch depends on n_envs and there does not seems to be a mechanism for updating n_batch outside of initialization. I am interested to hear the thoughts of the maintainers on this proposed patch.
Describe the bug Lets say you train PPO2 with a vectorized environment of length 2, then you save and load the model and train with a vectorized environment of size 1. A crash occurs.
Code example
Error
System Info development build from source(this error should be easy to reproduce with any build) python3.7
Additional context I have prepped a pull request with the patch and corresponding test. The test will fail when the patch is not applied. It may come down to opinion whether this is an error or not, but I think it is, since n_batch depends on n_envs and there does not seems to be a mechanism for updating n_batch outside of initialization. I am interested to hear the thoughts of the maintainers on this proposed patch.