hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

Cannot load pre-trained model to evaluate properly #1136

Closed wenjunli-0 closed 3 years ago

wenjunli-0 commented 3 years ago

I have trained a SAC model and stored the model at different timesteps. I want to evaluate each of them and see how they perform in the testing environment. However, I can load and evaluate model_0.zip and model_1.zip properly, and it failed when loading model_2.zip with this error "Loading a model without an environment, this model cannot be trained until it has a valid environment."

I tried several ways to fix this issue but all failed:

  1. define a testing model exactly as the training model, and then load model_2.zip;
  2. model_testing.set_env(env_testing);
env = LunarLanderRandomized(mep=11.0) 
env.reset()
model = SAC(policy="MlpPolicy", env=env, verbose=1, seed=20210901)
model.learn(total_timesteps=Train_Timestep)
model.save(model_dir + '_{}'.format(0))

for j in range(1, 10):
    model = SAC.load(model_dir + '_{}'.format(j-1))

    # re-init env
    env = LunarLanderRandomized(mep=11.0)
    env.reset()

    # learn
    model.set_env(env)
    model.learn(total_timesteps=Train_Timestep)
    model.save(model_dir + '_{}'.format(j))

for i in range(0, 10):
    env_testing = LunarLanderRandomized(mep=11.0)
    env_testing.reset()

    model_testing = SAC(policy="MlpPolicy", env=env_testing, verbose=1, seed=20210901)    # with or without this line both don't help

    model_testing = SAC.load(model_dir + '_{}'.format(i))
    model_testing.set_env(env_testing)

    mean_reward, std_reward = evaluate_policy(model_testing, env=env_testing, n_eval_episodes=10)
    print('testing performance at step-{}: Re={}'.format(i, mean_reward))

Could you please help me take a look and see how should I fix it? I've also tried DDPG, and it has the same bug.

Miffyli commented 3 years ago

Can you share the final exception/error you encounter? The message you shared is just a warning that training is not possible, but evaluation should still work. Setting the environment should definitely help.

wenjunli-0 commented 3 years ago

Can you share the final exception/error you encounter? The message you shared is just a warning that training is not possible, but evaluation should still work. Setting the environment should definitely help.

Below are the output messages. The program will be stuck in step-2, i.e. when i=2. I waited for more than 10 minutes to get the results of step-2. However, the evaluation for model_0.zip and model_1.zip only take several seconds. As the i increases, the evaluation time seems to be longer and longer. I assume there are some bugs, otherwise, a normal evaluation wouldn't be so slow.

Loading a model without an environment, this model cannot be trained until it has a valid environment.
Loading a model without an environment, this model cannot be trained until it has a valid environment.
Loading a model without an environment, this model cannot be trained until it has a valid environment.
Average testing performance at curriculum step-0: Re=-47.68431662522045
Loading a model without an environment, this model cannot be trained until it has a valid environment.
Average testing performance at curriculum step-1: Re=-79.47954591762584
Loading a model without an environment, this model cannot be trained until it has a valid environment.
Average testing performance at curriculum step-2: Re=-480.195298409764
Loading a model without an environment, this model cannot be trained until it has a valid environment.
Miffyli commented 3 years ago

Below are the output messages. The program will be stuck in step-2, i.e. when i=2. I waited for more than 10 minutes to get the results of step-2. However, the evaluation for model_0.zip and model_1.zip only take several seconds. As the i increases, the evaluation time seems to be longer and longer. I assume there are some bugs, otherwise, a normal evaluation wouldn't be so slow.

I would double-check that the agent does indeed run (e.g. by creating a manual env-step loop and see how it works). 10min for LunarLander indeed sounds like way too much, but it could be that your agents are learning to play very long episodes, and thus it takes time to evaluate.

wenjunli-0 commented 3 years ago

Below are the output messages. The program will be stuck in step-2, i.e. when i=2. I waited for more than 10 minutes to get the results of step-2. However, the evaluation for model_0.zip and model_1.zip only take several seconds. As the i increases, the evaluation time seems to be longer and longer. I assume there are some bugs, otherwise, a normal evaluation wouldn't be so slow.

I would double-check that the agent does indeed run (e.g. by creating a manual env-step loop and see how it works). 10min for LunarLander indeed sounds like way too much, but it could be that your agents are learning to play very long episodes, and thus it takes time to evaluate.

Thanks for your explanation and I may found the problem. The LunarLander environment does not have a max timestep per episode. So, in evaluation, the agent might be always flying in the sky. This issue does not happen during training but only in evaluation. After I add a max timestep limit to LunarLander env, this issue has been fixed.

I guess there is a max timestep limit in model.learn(), and no such limit in evaluate_policy().