hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

Converting OpenAI vectorized env to stable baselines vectorized env? #665

Closed GNiendorf closed 3 years ago

GNiendorf commented 4 years ago

I am trying to take an openai baselines environment (vectorized procgen env) and convert it to a stable baselines vectorized environment. Below is my naive attempt. I get the following error that it expects a function, but I am not sure how to resolve the issue. Any help is appreciated.


from procgen import ProcgenEnv

from baselines.common.vec_env import (
    VecExtractDictObs,
    VecMonitor,
    VecFrameStack,
    VecNormalize
)

venv = ProcgenEnv(num_envs=200, env_name="coinrun")

venv = VecExtractDictObs(venv, "rgb")

venv = VecMonitor(venv=venv, filename=None, keep_buf=200,)

venv = VecNormalize(venv=venv, ob=False)

venv = DummyVecEnv(venv)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-36f973b40ef0> in <module>
     16 venv = VecNormalize(venv=venv, ob=False)
     17 
---> 18 venv = DummyVecEnv(venv)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in __init__(self, env_fns)
     18 
     19     def __init__(self, env_fns):
---> 20         self.envs = [fn() for fn in env_fns]
     21         env = self.envs[0]
     22         VecEnv.__init__(self, len(env_fns), env.observation_space, env.action_space)

TypeError: 'VecNormalize' object is not iterable```
araffin commented 4 years ago

Hello, I think you should use the gym.Env (using gym.make, cf README) instead of the ProcgenEnv. And why do you want to convert a VecNormalize to a DummyVecEnv? (VecNormlize is already a VecEnv...)

GNiendorf commented 4 years ago

I am trying to convert it because when I try to train a PPO agent directly on the venv I get the following error:


Wrapping the env in a DummyVecEnv.

ValueError                                Traceback (most recent call last)
<ipython-input-12-42f191de9360> in <module>
      5 
      6 model = PPO2(MlpPolicy, venv, verbose=1)
----> 7 model.learn(total_timesteps=10000)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps)
    317             self._setup_learn()
    318 
--> 319             runner = Runner(env=self.env, model=self, n_steps=self.n_steps, gamma=self.gamma, lam=self.lam)
    320             self.episode_reward = np.zeros((self.n_envs,))
    321 

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, env, model, n_steps, gamma, lam)
    447         :param lam: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator
    448         """
--> 449         super().__init__(env=env, model=model, n_steps=n_steps)
    450         self.lam = lam
    451         self.gamma = gamma

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/runners.py in __init__(self, env, model, n_steps)
     17         self.batch_ob_shape = (n_env*n_steps,) + env.observation_space.shape
     18         self.obs = np.zeros((n_env,) + env.observation_space.shape, dtype=env.observation_space.dtype.name)
---> 19         self.obs[:] = env.reset()
     20         self.n_steps = n_steps
     21         self.states = model.initial_state

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in reset(self)
     51         for env_idx in range(self.num_envs):
     52             obs = self.envs[env_idx].reset()
---> 53             self._save_obs(env_idx, obs)
     54         return self._obs_from_buf()
     55 

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in _save_obs(self, env_idx, obs)
     70         for key in self.keys:
     71             if key is None:
---> 72                 self.buf_obs[key][env_idx] = obs
     73             else:
     74                 self.buf_obs[key][env_idx] = obs[key]

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)```
Miffyli commented 4 years ago

Two things:

There could be a chance the VecEnv coming out from ProcGen does not work in stable-baselines as it is, which probably should be fixed with a bug (being such a nice environment).

GNiendorf commented 4 years ago

I get the same error with CnnPolicy:

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)

Also, calling gym.make on a procgen environment actually calls ProcgenEnv in gym.

GNiendorf commented 4 years ago

Here I forced an error with gym.make by giving it a bad keyword and it shows that it calls ProcgenEnv at the end:


~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(id, **kwargs)
    154 
    155 def make(id, **kwargs):
--> 156     return registry.make(id, **kwargs)
    157 
    158 def spec(id):

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(self, path, **kwargs)
     99             logger.info('Making new env: %s', path)
    100         spec = self.spec(path)
--> 101         env = spec.make(**kwargs)
    102         # We used to have people override _reset/_step rather than
    103         # reset/step. Set _gym_disable_underscore_compat = True on

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(self, **kwargs)
     71         else:
     72             cls = load(self.entry_point)
---> 73             env = cls(**_kwargs)
     74 
     75         # Make the enviroment aware of which spec it came from.

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/procgen/gym_registration.py in make_env(**kwargs)
     16 
     17 def make_env(**kwargs):
---> 18     venv = ProcgenEnv(num_envs=1, num_threads=0, **kwargs)
     19     env = Scalarize(venv)
     20     env = RemoveDictObs(env, key="rgb")

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/procgen/env.py in __init__(self, num_envs, env_name, center_agent, options, use_generated_assets, paint_vel_info, distribution_mode, **kwargs)
    182             }
    183         )
--> 184         super().__init__(num_envs, env_name, options, **kwargs) ```
araffin commented 4 years ago

PS: as mentioned in the issue template, please use the markdown code blocks for both code and stack traces.

EDIT: did you check the env with the env_checker ? (also mentioned in the issue template)

Miffyli commented 4 years ago

And to clarify @araffin 's comment: Did you remove the extra VecEnvs? What happens if you do just

venv = ProcgenEnv(num_envs=200, env_name="coinrun")
model = PPO2(CnnPolicy, venv, verbose=1)
GNiendorf commented 4 years ago

In both cases (removing extra VecEnvs and keep them) I get the following error when using env_checker:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-38-3559b4aab3c7> in <module>
----> 1 check_env(venv)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/env_checker.py in check_env(env, warn, skip_render_check)
    179         True by default (useful for the CI)
    180     """
--> 181     assert isinstance(env, gym.Env), ("You environment must inherit from gym.Env class "
    182                                       " cf https://github.com/openai/gym/blob/master/gym/core.py")
    183 

AssertionError: You environment must inherit from gym.Env class  cf https://github.com/openai/gym/blob/master/gym/core.py
GNiendorf commented 4 years ago

Here if I remove the extra calls, same error:


Wrapping the env in a DummyVecEnv.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-2f9ca1a80365> in <module>
      7 venv = VecExtractDictObs(venv, "rgb")
      8 model = PPO2(CnnPolicy, venv, verbose=1)
----> 9 model.learn(total_timesteps=10000)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps)
    317             self._setup_learn()
    318 
--> 319             runner = Runner(env=self.env, model=self, n_steps=self.n_steps, gamma=self.gamma, lam=self.lam)
    320             self.episode_reward = np.zeros((self.n_envs,))
    321 

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, env, model, n_steps, gamma, lam)
    447         :param lam: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator
    448         """
--> 449         super().__init__(env=env, model=model, n_steps=n_steps)
    450         self.lam = lam
    451         self.gamma = gamma

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/runners.py in __init__(self, env, model, n_steps)
     17         self.batch_ob_shape = (n_env*n_steps,) + env.observation_space.shape
     18         self.obs = np.zeros((n_env,) + env.observation_space.shape, dtype=env.observation_space.dtype.name)
---> 19         self.obs[:] = env.reset()
     20         self.n_steps = n_steps
     21         self.states = model.initial_state

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in reset(self)
     51         for env_idx in range(self.num_envs):
     52             obs = self.envs[env_idx].reset()
---> 53             self._save_obs(env_idx, obs)
     54         return self._obs_from_buf()
     55 

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in _save_obs(self, env_idx, obs)
     70         for key in self.keys:
     71             if key is None:
---> 72                 self.buf_obs[key][env_idx] = obs
     73             else:
     74                 self.buf_obs[key][env_idx] = obs[key]

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3) ```
GNiendorf commented 4 years ago

If I remove that dictobs call though I get this error:


Wrapping the env in a DummyVecEnv.
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-43-7e513d6644be> in <module>
      5 
      6 venv = ProcgenEnv(num_envs=200, env_name="coinrun")
----> 7 model = PPO2(CnnPolicy, venv, verbose=1)
      8 model.learn(total_timesteps=10000)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, policy, env, gamma, n_steps, ent_coef, learning_rate, vf_coef, max_grad_norm, lam, nminibatches, noptepochs, cliprange, cliprange_vf, verbose, tensorboard_log, _init_setup_model, policy_kwargs, full_tensorboard_log, seed, n_cpu_tf_sess)
    102 
    103         if _init_setup_model:
--> 104             self.setup_model()
    105 
    106     def _get_pretrain_placeholders(self):

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in setup_model(self)
    132 
    133                 act_model = self.policy(self.sess, self.observation_space, self.action_space, self.n_envs, 1,
--> 134                                         n_batch_step, reuse=False, **self.policy_kwargs)
    135                 with tf.variable_scope("train_model", reuse=True,
    136                                        custom_getter=tf_util.outer_scope_getter("train_model")):

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, **_kwargs)
    599     def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, **_kwargs):
    600         super(CnnPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse,
--> 601                                         feature_extraction="cnn", **_kwargs)
    602 
    603 

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, layers, net_arch, act_fun, cnn_extractor, feature_extraction, **kwargs)
    538                  act_fun=tf.tanh, cnn_extractor=nature_cnn, feature_extraction="cnn", **kwargs):
    539         super(FeedForwardPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
--> 540                                                 scale=(feature_extraction == "cnn"))
    541 
    542         self._kwargs_check(feature_extraction, kwargs)

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, scale)
    219     def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, scale=False):
    220         super(ActorCriticPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
--> 221                                                 scale=scale)
    222         self._pdtype = make_proba_dist_type(ac_space)
    223         self._policy = None

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, scale, obs_phs, add_action_ph)
    115         with tf.variable_scope("input", reuse=False):
    116             if obs_phs is None:
--> 117                 self._obs_ph, self._processed_obs = observation_input(ob_space, n_batch, scale=scale)
    118             else:
    119                 self._obs_ph, self._processed_obs = obs_phs

~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/input.py in observation_input(ob_space, batch_size, name, scale)
     49     else:
     50         raise NotImplementedError("Error: the model does not support input space of type {}".format(
---> 51             type(ob_space).__name__))

NotImplementedError: Error: the model does not support input space of type Dict ```
GNiendorf commented 4 years ago

If I just use gym.make as suggested it works. The problem is that I can't specify the number of environments I want to train on (say 200 like before) since in the gym code it calls ProcgenEnv with num_envs=1.

env = gym.make("procgen:procgen-maze-v0", distribution_mode='easy')
model = PPO2(CnnPolicy, env, verbose=1)
model.learn(total_timesteps=10000)
Miffyli commented 4 years ago

Ah right, because by default the environment works on Dicts. Am I to assume correct that this one did not work either?

venv = ProcgenEnv(num_envs=200, env_name="coinrun")
venv = VecExtractDictObs(venv, "rgb")
model = PPO2(CnnPolicy, venv, verbose=1)

If that does not work you can still create the environments manually. You have to create a wrapper similar to VecExtractDictObs that extracts the "rgb" item from the observation dictionary each environment individually.

GNiendorf commented 4 years ago

That's correct, I get the same error as before (when including those extra venv calls)

ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)

GNiendorf commented 4 years ago

I see. I will attempt to create the wrapper. Thank you for the suggestion.

Miffyli commented 4 years ago

Thanks for trying this out! Sounds like a bug-ish thing we should fix, or at least offer tools to avoid redoing work of that environment.

araffin commented 4 years ago

You have to create a wrapper similar to VecExtractDictObs that extracts the "rgb" item from the observation dictionary each environment individually.

this already exist: https://github.com/openai/gym/blob/master/gym/wrappers/filter_observation.py

araffin commented 3 years ago

Fixed in SB3: https://github.com/DLR-RM/stable-baselines3/pull/311