hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

[question] VecNormalize with hyper parameter tuning #1122

Closed aleksanderhan closed 3 years ago

aleksanderhan commented 3 years ago

Hi, I'm trying to do some hyper parameter tuning, but am getting this error: ValueError("Trying to set venv of already initialized VecNormalize wrapper.") How am I supposed to use the normalization statistics from the training phase during validation without saving it to file during every trial?

def objective_fn(trial):
    model_params = optimize_ppo(trial)

    train_env, validation_env = initialize_envs()
    norm_env = VecNormalize(train_env, norm_obs=True, norm_reward=True, training=True)

    model = PPO(policy, 
                norm_env,
                device=device,
                **model_params)

    train_maxlen = len(train_env.get_attr('df')[0].index) - 1
    try:
        model.learn(train_maxlen)
    except Exception as error:
        print(error)
        raise optuna.structs.TrialPruned()

    norm_env.set_venv(validation_env)
    norm_env.training = False
    norm_env.norm_reward = False

    mean_reward, _ = evaluate_policy(model, norm_env, n_eval_episodes=5)

    if mean_reward == 0:
        raise optuna.structs.TrialPruned()

    return -mean_reward
aleksanderhan commented 3 years ago

Sorry for the code formatting, not sure why it doesn't work properly

Miffyli commented 3 years ago

Check the post markdown code, I updated it with prettier formatting :).

@araffin can you comment on this?

PS: I recommend you to also take a look at stable-baselines3 which is more actively maintained.

aleksanderhan commented 3 years ago

Thanks for that! I am actually using sb3, I must have posted this in the wrong repo. Should I close this and make a new question in the sb3 repo?

Miffyli commented 3 years ago

I think the API is same between the two repos so you can keep the post open here for now and wait till araffin replies :).

araffin commented 3 years ago

How am I supposed to use the normalization statistics from the training phase during validation without saving it to file during every trial?

You will find your answer here ;): https://github.com/DLR-RM/stable-baselines3/issues/473 (we have a sync_env_normalization for that.

https://github.com/DLR-RM/stable-baselines3/blob/75b6f3b3b0f207456d9dcac2c6e86e8e2a22115f/stable_baselines3/common/vec_env/__init__.py#L59-L72

aleksanderhan commented 3 years ago

Thank you both very much!