balisujohn commented 3 years ago

A patch fixing the ability of a loaded PPO2 model to further train on a vectorized environment of a different length than the one it was initially trained on.

Description

A test was added to test_ppo2.py which tests the ability of a loaded PPO2 model to train on a vectorized environment of a different length than the one it was initially trained on. set_env was overridden in ppo2.py to explicitly update n_batch to equal n_envs * n_steps as well as calling the set_env function of the super class.

Motivation and Context

Prior to this patch, the code in the issue in the linked pull request will fail, the patch aims to fix that. closes #1132

[x] I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)

Checklist:

[x] I've read the CONTRIBUTION guide (required)
[x] I have updated the changelog accordingly (required).
[ ] My change requires a change to the documentation.
[x] I have updated the tests accordingly (required for a bug fix or a new feature).
[ ] I have updated the documentation accordingly.
[ ] I have ensured pytest and pytype both pass (by running make pytest and make type).

balisujohn commented 3 years ago

Also possibly worth noting, I tried to reproduce this error with stablebaselines3 since development is more active there and it seems like stablebaselines3 can handle at least a simple version of this situation without crashing.

`

stable-baselines3

import gym

from stable_baselines3 import PPO from stable_baselines3.common.vec_env import DummyVecEnv from stable_baselines3.common.env_util import make_vec_env

env = make_vec_env("CartPole-v1", n_envs=4)

model = PPO("MlpPolicy", env,verbose=1, n_steps = 10) model.learn(total_timesteps=10000)

model.save("test_model") del model

env =make_vec_env("CartPole-v1", n_envs=1)

model=PPO.load("test_model", env)

model.learn(total_timesteps = 10000)`

(ran without error)

Miffyli commented 3 years ago

Sorry for the delay! Indeed SB3 is more refined in this regard.

Seems like this issue could happen with other algos so they should be fixed as well (e.g. A2C).

@araffin thoughts on merging this? Seems like an appropriate maintenance mode fix.

araffin commented 3 years ago

Sorry for the delay! Indeed SB3 is more refined in this regard.

yes, SB3 is the recommend solution.

Seems like this issue could happen with other algos so they should be fixed as well (e.g. A2C).

yes, probably the case for all A2C-like algorithms (A2C, ACER, ACKTR and maybe TRPO).

It seems we also need to fix type annotation (the new version of pytype does not like when things are not properly marked as optional).

@araffin thoughts on merging this? Seems like an appropriate maintenance mode fix.

I would be happy to merge it after the pytype issues are fixed (you can keep this PR to fix them).

Regarding the test, please use tmp_path pytest argument (automatic temp folder) rather than saving/loading in the same folder (I know that some tests do not use it, but should be the case in SB3 tests).

hill-a / stable-baselines

Fix re-training with different number of environments #1133

Description

Motivation and Context

Types of changes

Checklist:

derived from https://github.com/DLR-RM/stable-baselines3