hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

evaluate_policy of MlpLstmPolicy with DummyVecEnv #1129

Closed LeZhengThu closed 3 years ago

LeZhengThu commented 3 years ago

Hi @araffin @Miffyli , I'm working with MlpLstmPolicy with DummyVecEnv and want to evaluate the policy. However, there's an error: AssertionError: You must pass only one environment when using this function. Below is the code that replicates my error.

import gym
from stable_baselines import PPO2
from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines.common.vec_env import DummyVecEnv

def make_env():
    def maker():
        env = gym.make('CartPole-v1')
        return env
    return maker

n_training_envs = 4
envs = DummyVecEnv([make_env() for i in range(n_training_envs)])
lstm_model = PPO2('MlpLstmPolicy', envs, nminibatches=n_training_envs)
lstm_model.learn(int(1000))
evaluate_policy(lstm_model, lstm_model.get_env())
Miffyli commented 3 years ago

As error states, you can only use one env for evaluation. You need to create a new DummyVecEnv that only contains one of your environments.

LeZhengThu commented 3 years ago

@Miffyli I mean if there's any way to use multiple envs to accelerate the training? If understand correctly, the purpose of the vectorized env, like DummyVecEnv, is to stack multiple independent environments into a single environment and speed up the training process.

Miffyli commented 3 years ago

Unfortunately not. This is further constrained by LSTM policies, which require same number of envs for predict as they used during training. evaluate_policy automatically does this for you.

If you can live without LSTM policies, check out SB3 where policies can be evaluated with multiple envs.

LeZhengThu commented 3 years ago

OK, that's fair. Thanks for the reply. I'll close this question.