Converting OpenAI vectorized env to stable baselines vectorized env?

GNiendorf commented 4 years ago

I am trying to take an openai baselines environment (vectorized procgen env) and convert it to a stable baselines vectorized environment. Below is my naive attempt. I get the following error that it expects a function, but I am not sure how to resolve the issue. Any help is appreciated.


from procgen import ProcgenEnv

from baselines.common.vec_env import (
    VecExtractDictObs,
    VecMonitor,
    VecFrameStack,
    VecNormalize
)

venv = ProcgenEnv(num_envs=200, env_name="coinrun")

venv = VecExtractDictObs(venv, "rgb")

venv = VecMonitor(venv=venv, filename=None, keep_buf=200,)

venv = VecNormalize(venv=venv, ob=False)

venv = DummyVecEnv(venv)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-36f973b40ef0> in <module>
     16 venv = VecNormalize(venv=venv, ob=False)
     17 
---> 18 venv = DummyVecEnv(venv)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in __init__(self, env_fns)
     18 
     19     def __init__(self, env_fns):
---> 20         self.envs = [fn() for fn in env_fns]
     21         env = self.envs[0]
     22         VecEnv.__init__(self, len(env_fns), env.observation_space, env.action_space)

TypeError: 'VecNormalize' object is not iterable```

araffin commented 4 years ago

Hello, I think you should use the gym.Env (using gym.make, cf README) instead of the ProcgenEnv. And why do you want to convert a VecNormalize to a DummyVecEnv? (VecNormlize is already a VecEnv...)

GNiendorf commented 4 years ago

I am trying to convert it because when I try to train a PPO agent directly on the venv I get the following error:


Wrapping the env in a DummyVecEnv.

ValueError                                Traceback (most recent call last)
<ipython-input-12-42f191de9360> in <module>
      5 
      6 model = PPO2(MlpPolicy, venv, verbose=1)
----> 7 model.learn(total_timesteps=10000)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps)
    317             self._setup_learn()
    318 
--> 319             runner = Runner(env=self.env, model=self, n_steps=self.n_steps, gamma=self.gamma, lam=self.lam)
    320             self.episode_reward = np.zeros((self.n_envs,))
    321 

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, env, model, n_steps, gamma, lam)
    447         :param lam: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator
    448         """
--> 449         super().__init__(env=env, model=model, n_steps=n_steps)
    450         self.lam = lam
    451         self.gamma = gamma

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/runners.py in __init__(self, env, model, n_steps)
     17         self.batch_ob_shape = (n_env*n_steps,) + env.observation_space.shape
     18         self.obs = np.zeros((n_env,) + env.observation_space.shape, dtype=env.observation_space.dtype.name)
---> 19         self.obs[:] = env.reset()
     20         self.n_steps = n_steps
     21         self.states = model.initial_state

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in reset(self)
     51         for env_idx in range(self.num_envs):
     52             obs = self.envs[env_idx].reset()
---> 53             self._save_obs(env_idx, obs)
     54         return self._obs_from_buf()
     55 

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in _save_obs(self, env_idx, obs)
     70         for key in self.keys:
     71             if key is None:
---> 72                 self.buf_obs[key][env_idx] = obs
     73             else:
     74                 self.buf_obs[key][env_idx] = obs[key]

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)```

Miffyli commented 4 years ago

Two things:

The environment you are trying to learn uses images, so you need CnnPolicy, not MlpPolicy
You (probably) do not have to wrap your environment into new VecEnvs as, like arrafin mentioned, the ProcGen environment is already vectorized and you give it to the learn method as it is.

There could be a chance the VecEnv coming out from ProcGen does not work in stable-baselines as it is, which probably should be fixed with a bug (being such a nice environment).

GNiendorf commented 4 years ago

I get the same error with CnnPolicy:

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)

Also, calling gym.make on a procgen environment actually calls ProcgenEnv in gym.

GNiendorf commented 4 years ago

Here I forced an error with gym.make by giving it a bad keyword and it shows that it calls ProcgenEnv at the end:


~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(id, **kwargs)
    154 
    155 def make(id, **kwargs):
--> 156     return registry.make(id, **kwargs)
    157 
    158 def spec(id):

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(self, path, **kwargs)
     99             logger.info('Making new env: %s', path)
    100         spec = self.spec(path)
--> 101         env = spec.make(**kwargs)
    102         # We used to have people override _reset/_step rather than
    103         # reset/step. Set _gym_disable_underscore_compat = True on

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(self, **kwargs)
     71         else:
     72             cls = load(self.entry_point)
---> 73             env = cls(**_kwargs)
     74 
     75         # Make the enviroment aware of which spec it came from.

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/procgen/gym_registration.py in make_env(**kwargs)
     16 
     17 def make_env(**kwargs):
---> 18     venv = ProcgenEnv(num_envs=1, num_threads=0, **kwargs)
     19     env = Scalarize(venv)
     20     env = RemoveDictObs(env, key="rgb")

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/procgen/env.py in __init__(self, num_envs, env_name, center_agent, options, use_generated_assets, paint_vel_info, distribution_mode, **kwargs)
    182             }
    183         )
--> 184         super().__init__(num_envs, env_name, options, **kwargs) ```

araffin commented 4 years ago

PS: as mentioned in the issue template, please use the markdown code blocks for both code and stack traces.

EDIT: did you check the env with the env_checker ? (also mentioned in the issue template)

Miffyli commented 4 years ago

And to clarify @araffin 's comment: Did you remove the extra VecEnvs? What happens if you do just

venv = ProcgenEnv(num_envs=200, env_name="coinrun")
model = PPO2(CnnPolicy, venv, verbose=1)

GNiendorf commented 4 years ago

In both cases (removing extra VecEnvs and keep them) I get the following error when using env_checker:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-38-3559b4aab3c7> in <module>
----> 1 check_env(venv)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/env_checker.py in check_env(env, warn, skip_render_check)
    179         True by default (useful for the CI)
    180     """
--> 181     assert isinstance(env, gym.Env), ("You environment must inherit from gym.Env class "
    182                                       " cf https://github.com/openai/gym/blob/master/gym/core.py")
    183 

AssertionError: You environment must inherit from gym.Env class  cf https://github.com/openai/gym/blob/master/gym/core.py

GNiendorf commented 4 years ago

Here if I remove the extra calls, same error:


Wrapping the env in a DummyVecEnv.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-2f9ca1a80365> in <module>
      7 venv = VecExtractDictObs(venv, "rgb")
      8 model = PPO2(CnnPolicy, venv, verbose=1)
----> 9 model.learn(total_timesteps=10000)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps)
    317             self._setup_learn()
    318 
--> 319             runner = Runner(env=self.env, model=self, n_steps=self.n_steps, gamma=self.gamma, lam=self.lam)
    320             self.episode_reward = np.zeros((self.n_envs,))
    321 

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, env, model, n_steps, gamma, lam)
    447         :param lam: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator
    448         """
--> 449         super().__init__(env=env, model=model, n_steps=n_steps)
    450         self.lam = lam
    451         self.gamma = gamma

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/runners.py in __init__(self, env, model, n_steps)
     17         self.batch_ob_shape = (n_env*n_steps,) + env.observation_space.shape
     18         self.obs = np.zeros((n_env,) + env.observation_space.shape, dtype=env.observation_space.dtype.name)
---> 19         self.obs[:] = env.reset()
     20         self.n_steps = n_steps
     21         self.states = model.initial_state

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in reset(self)
     51         for env_idx in range(self.num_envs):
     52             obs = self.envs[env_idx].reset()
---> 53             self._save_obs(env_idx, obs)
     54         return self._obs_from_buf()
     55 

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in _save_obs(self, env_idx, obs)
     70         for key in self.keys:
     71             if key is None:
---> 72                 self.buf_obs[key][env_idx] = obs
     73             else:
     74                 self.buf_obs[key][env_idx] = obs[key]

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3) ```

GNiendorf commented 4 years ago

If I remove that dictobs call though I get this error:


Wrapping the env in a DummyVecEnv.
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-43-7e513d6644be> in <module>
      5 
      6 venv = ProcgenEnv(num_envs=200, env_name="coinrun")
----> 7 model = PPO2(CnnPolicy, venv, verbose=1)
      8 model.learn(total_timesteps=10000)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, policy, env, gamma, n_steps, ent_coef, learning_rate, vf_coef, max_grad_norm, lam, nminibatches, noptepochs, cliprange, cliprange_vf, verbose, tensorboard_log, _init_setup_model, policy_kwargs, full_tensorboard_log, seed, n_cpu_tf_sess)
    102 
    103         if _init_setup_model:
--> 104             self.setup_model()
    105 
    106     def _get_pretrain_placeholders(self):

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in setup_model(self)
    132 
    133                 act_model = self.policy(self.sess, self.observation_space, self.action_space, self.n_envs, 1,
--> 134                                         n_batch_step, reuse=False, **self.policy_kwargs)
    135                 with tf.variable_scope("train_model", reuse=True,
    136                                        custom_getter=tf_util.outer_scope_getter("train_model")):

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, **_kwargs)
    599     def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, **_kwargs):
    600         super(CnnPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse,
--> 601                                         feature_extraction="cnn", **_kwargs)
    602 
    603 

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, layers, net_arch, act_fun, cnn_extractor, feature_extraction, **kwargs)
    538                  act_fun=tf.tanh, cnn_extractor=nature_cnn, feature_extraction="cnn", **kwargs):
    539         super(FeedForwardPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
--> 540                                                 scale=(feature_extraction == "cnn"))
    541 
    542         self._kwargs_check(feature_extraction, kwargs)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, scale)
    219     def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, scale=False):
    220         super(ActorCriticPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
--> 221                                                 scale=scale)
    222         self._pdtype = make_proba_dist_type(ac_space)
    223         self._policy = None

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, scale, obs_phs, add_action_ph)
    115         with tf.variable_scope("input", reuse=False):
    116             if obs_phs is None:
--> 117                 self._obs_ph, self._processed_obs = observation_input(ob_space, n_batch, scale=scale)
    118             else:
    119                 self._obs_ph, self._processed_obs = obs_phs

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/input.py in observation_input(ob_space, batch_size, name, scale)
     49     else:
     50         raise NotImplementedError("Error: the model does not support input space of type {}".format(
---> 51             type(ob_space).__name__))

NotImplementedError: Error: the model does not support input space of type Dict ```

GNiendorf commented 4 years ago

If I just use gym.make as suggested it works. The problem is that I can't specify the number of environments I want to train on (say 200 like before) since in the gym code it calls ProcgenEnv with num_envs=1.

env = gym.make("procgen:procgen-maze-v0", distribution_mode='easy')
model = PPO2(CnnPolicy, env, verbose=1)
model.learn(total_timesteps=10000)

Miffyli commented 4 years ago

Ah right, because by default the environment works on Dicts. Am I to assume correct that this one did not work either?

venv = ProcgenEnv(num_envs=200, env_name="coinrun")
venv = VecExtractDictObs(venv, "rgb")
model = PPO2(CnnPolicy, venv, verbose=1)

If that does not work you can still create the environments manually. You have to create a wrapper similar to VecExtractDictObs that extracts the "rgb" item from the observation dictionary each environment individually.

GNiendorf commented 4 years ago

That's correct, I get the same error as before (when including those extra venv calls)

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)

GNiendorf commented 4 years ago

I see. I will attempt to create the wrapper. Thank you for the suggestion.

Miffyli commented 4 years ago

Thanks for trying this out! Sounds like a bug-ish thing we should fix, or at least offer tools to avoid redoing work of that environment.

araffin commented 4 years ago

You have to create a wrapper similar to VecExtractDictObs that extracts the "rgb" item from the observation dictionary each environment individually.

this already exist: https://github.com/openai/gym/blob/master/gym/wrappers/filter_observation.py

araffin commented 3 years ago

Fixed in SB3: https://github.com/DLR-RM/stable-baselines3/pull/311

hill-a / stable-baselines

Converting OpenAI vectorized env to stable baselines vectorized env? #665