hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

VecNormalize for multiple training environments? #1114

Open jdshaolinstar opened 3 years ago

jdshaolinstar commented 3 years ago

Hello. I've read the docs for how to use vecNormalize for an environment. I feel it is unclear as to how to approach using vecnormalize when training an agent across multiple environments.

Given I want to save and load a model training on 3 different environments each with its own dataset..... Which option makes more sense? a.) save and load 3 different vecNormalized environments file ? `env.save(env_1_vec_file) .... env = VecNormalize.load(env_1_vec_file, env3)

env.save(env_2_vec_file) .... env = VecNormalize.load(env_2_vec_file, env3)

... env.save(env_3_vec_file) ... env = VecNormalize.load(env_3_vec_file, env3)`

b.) `save and load all env to 1 shared_vecnormalize_file env.save(shared_vec_file) .... env = VecNormalize.load(shared_vec_file, env3)

env.save(shared_vec_file) .... env = VecNormalize.load(shared_vec_file, env3)

... env.save(shared_vec_file) ... env = VecNormalize.load(shared_vec_file, env3)`

araffin commented 3 years ago

3 different environments each with its own dataset.....

If you have datasets, you can probably compute the statistics and normalize in advance, so you should not need VecNormalize, no?

jdshaolinstar commented 3 years ago

Hmm.... That's actually a great idea... Before using stable baselines, I was normalizing the data. VecNormalize just seems like an intelligent wrapper that does alot of good things. The reason I don't normalize the data in advance is because I want the ability to receive live ( unknown scale ) data in a production setting, not just from an existing dataset. So I guess I could normalize the data before feeding it into the env, but I feel I would just be re inventing what VecNormalize probably does well already. Also, I believe VecNormalize also takes care of reward normalization? I'm using that alongside MlpLnLstmPolicy (normalized Lstm) policy.

jdshaolinstar commented 3 years ago

@araffin I'm assuming that VecNormalize is applying weights to the inputs that the environment receives itself and has nothing to do with whats saved in the model. So loading different normalized vectors between environments probably doesn't matter too much. ( I hope ). So perhaps that question is solved. I'm not sure if this should be a different topic, but in regards to multiprocessing, I've read that with recurrent networks, the model has to be tested with the same number of envs that it's trained on. If I spawn 5 separate python processes and load the same model, could I save them each to the same model file after their individual episodes? Or is there some shared logic between steps that actually needs to happen? Asking the same question a different way, when I save a training to a model file, are the weights being intelligently blended together, regardless of if I'm training the same model in 5 different processes, or will the newer process overwrite the progress of the concurrent model saves?

araffin commented 3 years ago

when I save a training to a model file, are the weights being intelligently blended together, regardless of if I'm training the same model in 5 different processes, or will the newer process overwrite the progress of the concurrent model saves?

When using VecEnv, only the env computation is separated between processes, the model and gradient updates are done in the same process. When using MPI (with PPO1 for instance), there is synchronization being done after each gradient step.