Add support for VecNormalize

araffin commented 2 years ago

Detect and save automatically normalization when present (using for instance unwrap_vec_normalize() helper).

simoninithomas commented 2 years ago

On it 😄 !

simoninithomas commented 2 years ago

Here's what I've done, but I'm not sure of the result 🤔 :

In package_to_hub

# If we normalizing input features, get and save the VecNormalize statistics
vecnorm = model.get_vec_normalize_env()

if vecnorm:
   # Save the VecNormalize statistics to the repo
   vecnorm_path = Path(repo_local_path) / "vec_normalize.pkl"
   vecnorm.save(vecnorm_path)

   # Load the VecNormalize statistics
   eval_env = VecNormalize.load(vecnorm_path, eval_env)

   # Do not update VecNormalization stats at test time
   eval_env.training = False
   # Reward normalization is not needed at test time
   eval_env.norm_reward = False

We retrieve the vecnorm from the model using get_vec_normalize_env() helper since we pass the model as a parameter in package_to_hub. https://github.com/huggingface/huggingface_sb3/blob/feature-vecnorm/huggingface_sb3/push_to_hub.py
I made some tests here: https://colab.research.google.com/drive/1qnWiwBKt5EYsPB3wyQRWJ_TT-P9vKdYJ?usp=sharing
Here what it looks like: https://huggingface.co/ThomasSimonini/TEST15ppo-Walker2DBulletEnv-v0

araffin commented 2 years ago

vecnorm = model.get_vec_normalize_env()

you should use unwrap_vec_normalize because you are usually using an eval env not connected to the model (for instance when you load the model), otherwise the rest is correct ;)

simoninithomas commented 2 years ago

So in this case it implies that I need to add a env parameter to package_to_hub. I can't use eval_env for that (vecnorm = unwrap_vec_normalize(eval_env)) given we didn't trained VecNormalization stats with it 🤔

araffin commented 2 years ago

https://github.com/huggingface/huggingface_sb3/blob/2c898f15afe3bec93dd3a7c181096b442c8cafd1/huggingface_sb3/push_to_hub.py#L246

That's already the case, no?

(vecnorm = unwrap_vec_normalize(eval_env)) given we didn't trained VecNormalization stats with it

vecnorm=None if there is no VecNormalize, you mostly need to make sure eval_env is a vec env (if not, you know that there is no normalization anyway).

araffin commented 2 years ago

Oh, looking at your implementation, I think we are thinking differently. I just wanted to save the stats if needed and keep the rest the same (assuming that eval env is already properly wrapped).

simoninithomas commented 2 years ago

Okay just to be sure:

We have env that is a vec env. When we train our model model.learn(env), we also normalize the environment (input features and reward).
Now, I need to get this vec_normalize.pkl (from env) because if I get the one from the eval_env they are useless since they were not updated during the training right?
In package to hub I need to evaluate the agent, if I need to evaluate it, I need to get the eval_env (that I assume being properly wrapped) and load the vec_normalize.pkl (except I don't normalize rewards this time).

So my idea was:

Get the one from the model
Load in eval_env
Evaluate etc

What process you were thinking about? Because I'm not sure mine is the best idea 🤔

araffin commented 2 years ago

We have env that is a vec env. When we train our model model.learn(env), we also normalize the environment (input features and reward).

yes, unless you load a model (for instance best model using eval callback)

because if I get the one from the eval_env they are useless since they were not updated during the training right?

It depends how you define eval env, but normally you should already use the stats from the training env. In the eval callback, we actually synchronize the two.

(that I assume being properly wrapped)

properly wrapped for me also mean that it has the correct stats, so you could discard model.env completely.

So my idea was:

this would work only if you do the packaging right after training and it requires that the model has an env

My idea was simpler:

assume eval env is properly wrapped and with the correct stats
evaluate as usual but save the stats using the eval env

huggingface / huggingface_sb3

Add support for VecNormalize #6