hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

Fix for PPO2 When loading a model and then training with vectorized environment with a different vector length #1132

Closed balisujohn closed 3 years ago

balisujohn commented 3 years ago

Describe the bug Lets say you train PPO2 with a vectorized environment of length 2, then you save and load the model and train with a vectorized environment of size 1. A crash occurs.

Code example


    env = make_vec_env('CartPole-v1',n_envs=2)
    model = PPO2('MlpPolicy', env, n_steps = 10, nminibatches=1)

    model.learn(total_timesteps = 100)
    model.save("ppo2_cartpole")

    del model

    model = PPO2.load("ppo2_cartpole")
    test_env = DummyVecEnv([lambda: gym.make('CartPole-v1')])

    model.set_env(test_env)
    model.learn(total_timesteps = 100)

Error

slices = (arr[mbinds] for arr in (obs, returns, masks, actions, values, neglogpacs))
E   IndexError: index 13 is out of bounds for axis 0 with size 10

stable_baselines/ppo2/ppo2.py:362: IndexError

System Info development build from source(this error should be easy to reproduce with any build) python3.7

Additional context I have prepped a pull request with the patch and corresponding test. The test will fail when the patch is not applied. It may come down to opinion whether this is an error or not, but I think it is, since n_batch depends on n_envs and there does not seems to be a mechanism for updating n_batch outside of initialization. I am interested to hear the thoughts of the maintainers on this proposed patch.