Closed GNiendorf closed 3 years ago
Hello,
I think you should use the gym.Env
(using gym.make
, cf README) instead of the ProcgenEnv
.
And why do you want to convert a VecNormalize
to a DummyVecEnv
? (VecNormlize
is already a VecEnv
...)
I am trying to convert it because when I try to train a PPO agent directly on the venv I get the following error:
Wrapping the env in a DummyVecEnv.
ValueError Traceback (most recent call last)
<ipython-input-12-42f191de9360> in <module>
5
6 model = PPO2(MlpPolicy, venv, verbose=1)
----> 7 model.learn(total_timesteps=10000)
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps)
317 self._setup_learn()
318
--> 319 runner = Runner(env=self.env, model=self, n_steps=self.n_steps, gamma=self.gamma, lam=self.lam)
320 self.episode_reward = np.zeros((self.n_envs,))
321
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, env, model, n_steps, gamma, lam)
447 :param lam: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator
448 """
--> 449 super().__init__(env=env, model=model, n_steps=n_steps)
450 self.lam = lam
451 self.gamma = gamma
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/runners.py in __init__(self, env, model, n_steps)
17 self.batch_ob_shape = (n_env*n_steps,) + env.observation_space.shape
18 self.obs = np.zeros((n_env,) + env.observation_space.shape, dtype=env.observation_space.dtype.name)
---> 19 self.obs[:] = env.reset()
20 self.n_steps = n_steps
21 self.states = model.initial_state
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in reset(self)
51 for env_idx in range(self.num_envs):
52 obs = self.envs[env_idx].reset()
---> 53 self._save_obs(env_idx, obs)
54 return self._obs_from_buf()
55
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in _save_obs(self, env_idx, obs)
70 for key in self.keys:
71 if key is None:
---> 72 self.buf_obs[key][env_idx] = obs
73 else:
74 self.buf_obs[key][env_idx] = obs[key]
ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)```
Two things:
CnnPolicy
, not MlpPolicy
learn
method as it is.There could be a chance the VecEnv coming out from ProcGen does not work in stable-baselines as it is, which probably should be fixed with a bug (being such a nice environment).
I get the same error with CnnPolicy:
ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)
Also, calling gym.make on a procgen environment actually calls ProcgenEnv in gym.
Here I forced an error with gym.make by giving it a bad keyword and it shows that it calls ProcgenEnv at the end:
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(id, **kwargs)
154
155 def make(id, **kwargs):
--> 156 return registry.make(id, **kwargs)
157
158 def spec(id):
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(self, path, **kwargs)
99 logger.info('Making new env: %s', path)
100 spec = self.spec(path)
--> 101 env = spec.make(**kwargs)
102 # We used to have people override _reset/_step rather than
103 # reset/step. Set _gym_disable_underscore_compat = True on
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/gym/envs/registration.py in make(self, **kwargs)
71 else:
72 cls = load(self.entry_point)
---> 73 env = cls(**_kwargs)
74
75 # Make the enviroment aware of which spec it came from.
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/procgen/gym_registration.py in make_env(**kwargs)
16
17 def make_env(**kwargs):
---> 18 venv = ProcgenEnv(num_envs=1, num_threads=0, **kwargs)
19 env = Scalarize(venv)
20 env = RemoveDictObs(env, key="rgb")
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/procgen/env.py in __init__(self, num_envs, env_name, center_agent, options, use_generated_assets, paint_vel_info, distribution_mode, **kwargs)
182 }
183 )
--> 184 super().__init__(num_envs, env_name, options, **kwargs) ```
PS: as mentioned in the issue template, please use the markdown code blocks for both code and stack traces.
EDIT: did you check the env with the env_checker
? (also mentioned in the issue template)
And to clarify @araffin 's comment: Did you remove the extra VecEnvs? What happens if you do just
venv = ProcgenEnv(num_envs=200, env_name="coinrun")
model = PPO2(CnnPolicy, venv, verbose=1)
In both cases (removing extra VecEnvs and keep them) I get the following error when using env_checker:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-38-3559b4aab3c7> in <module>
----> 1 check_env(venv)
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/env_checker.py in check_env(env, warn, skip_render_check)
179 True by default (useful for the CI)
180 """
--> 181 assert isinstance(env, gym.Env), ("You environment must inherit from gym.Env class "
182 " cf https://github.com/openai/gym/blob/master/gym/core.py")
183
AssertionError: You environment must inherit from gym.Env class cf https://github.com/openai/gym/blob/master/gym/core.py
Here if I remove the extra calls, same error:
Wrapping the env in a DummyVecEnv.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-42-2f9ca1a80365> in <module>
7 venv = VecExtractDictObs(venv, "rgb")
8 model = PPO2(CnnPolicy, venv, verbose=1)
----> 9 model.learn(total_timesteps=10000)
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps)
317 self._setup_learn()
318
--> 319 runner = Runner(env=self.env, model=self, n_steps=self.n_steps, gamma=self.gamma, lam=self.lam)
320 self.episode_reward = np.zeros((self.n_envs,))
321
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, env, model, n_steps, gamma, lam)
447 :param lam: (float) Factor for trade-off of bias vs variance for Generalized Advantage Estimator
448 """
--> 449 super().__init__(env=env, model=model, n_steps=n_steps)
450 self.lam = lam
451 self.gamma = gamma
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/runners.py in __init__(self, env, model, n_steps)
17 self.batch_ob_shape = (n_env*n_steps,) + env.observation_space.shape
18 self.obs = np.zeros((n_env,) + env.observation_space.shape, dtype=env.observation_space.dtype.name)
---> 19 self.obs[:] = env.reset()
20 self.n_steps = n_steps
21 self.states = model.initial_state
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in reset(self)
51 for env_idx in range(self.num_envs):
52 obs = self.envs[env_idx].reset()
---> 53 self._save_obs(env_idx, obs)
54 return self._obs_from_buf()
55
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py in _save_obs(self, env_idx, obs)
70 for key in self.keys:
71 if key is None:
---> 72 self.buf_obs[key][env_idx] = obs
73 else:
74 self.buf_obs[key][env_idx] = obs[key]
ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3) ```
If I remove that dictobs call though I get this error:
Wrapping the env in a DummyVecEnv.
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-43-7e513d6644be> in <module>
5
6 venv = ProcgenEnv(num_envs=200, env_name="coinrun")
----> 7 model = PPO2(CnnPolicy, venv, verbose=1)
8 model.learn(total_timesteps=10000)
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in __init__(self, policy, env, gamma, n_steps, ent_coef, learning_rate, vf_coef, max_grad_norm, lam, nminibatches, noptepochs, cliprange, cliprange_vf, verbose, tensorboard_log, _init_setup_model, policy_kwargs, full_tensorboard_log, seed, n_cpu_tf_sess)
102
103 if _init_setup_model:
--> 104 self.setup_model()
105
106 def _get_pretrain_placeholders(self):
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py in setup_model(self)
132
133 act_model = self.policy(self.sess, self.observation_space, self.action_space, self.n_envs, 1,
--> 134 n_batch_step, reuse=False, **self.policy_kwargs)
135 with tf.variable_scope("train_model", reuse=True,
136 custom_getter=tf_util.outer_scope_getter("train_model")):
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, **_kwargs)
599 def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, **_kwargs):
600 super(CnnPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse,
--> 601 feature_extraction="cnn", **_kwargs)
602
603
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, layers, net_arch, act_fun, cnn_extractor, feature_extraction, **kwargs)
538 act_fun=tf.tanh, cnn_extractor=nature_cnn, feature_extraction="cnn", **kwargs):
539 super(FeedForwardPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
--> 540 scale=(feature_extraction == "cnn"))
541
542 self._kwargs_check(feature_extraction, kwargs)
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, scale)
219 def __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=False, scale=False):
220 super(ActorCriticPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse=reuse,
--> 221 scale=scale)
222 self._pdtype = make_proba_dist_type(ac_space)
223 self._policy = None
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/policies.py in __init__(self, sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, scale, obs_phs, add_action_ph)
115 with tf.variable_scope("input", reuse=False):
116 if obs_phs is None:
--> 117 self._obs_ph, self._processed_obs = observation_input(ob_space, n_batch, scale=scale)
118 else:
119 self._obs_ph, self._processed_obs = obs_phs
~/Documents/drl/research/venv_proc/lib/python3.6/site-packages/stable_baselines/common/input.py in observation_input(ob_space, batch_size, name, scale)
49 else:
50 raise NotImplementedError("Error: the model does not support input space of type {}".format(
---> 51 type(ob_space).__name__))
NotImplementedError: Error: the model does not support input space of type Dict ```
If I just use gym.make as suggested it works. The problem is that I can't specify the number of environments I want to train on (say 200 like before) since in the gym code it calls ProcgenEnv with num_envs=1.
env = gym.make("procgen:procgen-maze-v0", distribution_mode='easy')
model = PPO2(CnnPolicy, env, verbose=1)
model.learn(total_timesteps=10000)
Ah right, because by default the environment works on Dicts. Am I to assume correct that this one did not work either?
venv = ProcgenEnv(num_envs=200, env_name="coinrun")
venv = VecExtractDictObs(venv, "rgb")
model = PPO2(CnnPolicy, venv, verbose=1)
If that does not work you can still create the environments manually. You have to create a wrapper similar to VecExtractDictObs
that extracts the "rgb" item from the observation dictionary each environment individually.
That's correct, I get the same error as before (when including those extra venv calls)
ValueError: could not broadcast input array from shape (200,64,64,3) into shape (64,64,3)
I see. I will attempt to create the wrapper. Thank you for the suggestion.
Thanks for trying this out! Sounds like a bug-ish thing we should fix, or at least offer tools to avoid redoing work of that environment.
You have to create a wrapper similar to VecExtractDictObs that extracts the "rgb" item from the observation dictionary each environment individually.
this already exist: https://github.com/openai/gym/blob/master/gym/wrappers/filter_observation.py
Fixed in SB3: https://github.com/DLR-RM/stable-baselines3/pull/311
I am trying to take an openai baselines environment (vectorized procgen env) and convert it to a stable baselines vectorized environment. Below is my naive attempt. I get the following error that it expects a function, but I am not sure how to resolve the issue. Any help is appreciated.