huggingface / huggingface_sb3

Additional code for Stable-baselines3 to load and upload models from the Hub.
76 stars 23 forks source link

Add support for VecNormalize #6

Closed araffin closed 2 years ago

araffin commented 2 years ago

Detect and save automatically normalization when present (using for instance unwrap_vec_normalize() helper).

simoninithomas commented 2 years ago

On it 😄 !

simoninithomas commented 2 years ago

Here's what I've done, but I'm not sure of the result 🤔 :

In package_to_hub

# If we normalizing input features, get and save the VecNormalize statistics
vecnorm = model.get_vec_normalize_env()

if vecnorm:
   # Save the VecNormalize statistics to the repo
   vecnorm_path = Path(repo_local_path) / "vec_normalize.pkl"
   vecnorm.save(vecnorm_path)

   # Load the VecNormalize statistics
   eval_env = VecNormalize.load(vecnorm_path, eval_env)

   # Do not update VecNormalization stats at test time
   eval_env.training = False
   # Reward normalization is not needed at test time
   eval_env.norm_reward = False
araffin commented 2 years ago

vecnorm = model.get_vec_normalize_env()

you should use unwrap_vec_normalize because you are usually using an eval env not connected to the model (for instance when you load the model), otherwise the rest is correct ;)

simoninithomas commented 2 years ago

So in this case it implies that I need to add a env parameter to package_to_hub. I can't use eval_env for that (vecnorm = unwrap_vec_normalize(eval_env)) given we didn't trained VecNormalization stats with it 🤔

araffin commented 2 years ago

https://github.com/huggingface/huggingface_sb3/blob/2c898f15afe3bec93dd3a7c181096b442c8cafd1/huggingface_sb3/push_to_hub.py#L246

That's already the case, no?

(vecnorm = unwrap_vec_normalize(eval_env)) given we didn't trained VecNormalization stats with it

vecnorm=None if there is no VecNormalize, you mostly need to make sure eval_env is a vec env (if not, you know that there is no normalization anyway).

araffin commented 2 years ago

Oh, looking at your implementation, I think we are thinking differently. I just wanted to save the stats if needed and keep the rest the same (assuming that eval env is already properly wrapped).

simoninithomas commented 2 years ago

Okay just to be sure:

So my idea was:

What process you were thinking about? Because I'm not sure mine is the best idea 🤔

araffin commented 2 years ago

We have env that is a vec env. When we train our model model.learn(env), we also normalize the environment (input features and reward).

yes, unless you load a model (for instance best model using eval callback)

because if I get the one from the eval_env they are useless since they were not updated during the training right?

It depends how you define eval env, but normally you should already use the stats from the training env. In the eval callback, we actually synchronize the two.

(that I assume being properly wrapped)

properly wrapped for me also mean that it has the correct stats, so you could discard model.env completely.

So my idea was:

this would work only if you do the packaging right after training and it requires that the model has an env

My idea was simpler: