evn.render() in doc examples don't run unless n_envs=1 in make_vec_env()

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

http://stable-baselines.readthedocs.io/

MIT License

4.16k stars 725 forks source link

evn.render() in doc examples don't run unless n_envs=1 in make_vec_env() #945

Closed pstansell closed 4 years ago

pstansell commented 4 years ago

This is possibly quite trivial, but the evn.render() lines of the code examples do not run for multiprocess environments unless unless n_envs=1 in make_vec_env().

For example, the code example on this page

https://stable-baselines.readthedocs.io/en/master/modules/ppo2.html

I'm using stable-baselines 2.10.0.

The error is

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/lib/python3.6/site-packages/stable_baselines/common/vec_env/dummy_vec_env.py", line 85, in render
    return super().render(*args, **kwargs)
TypeError: render() missing 1 required positional argument: 'mode'

Miffyli commented 4 years ago

Thanks for reporting this. Indeed the example code does not work. The issue seems to lie in VecEnv definition where default render mode is not defined but in Gym Env the default is "human", and judging by the examples this should be default for vecenvs too.

However when I tried env.render("human") on the example something else broke regarding image stuff. That needs bit more time to look into.

pstansell commented 4 years ago

I quick work-around is to set the n_envs=1 before running env.render().

For example:

import gym

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common import make_vec_env
from stable_baselines import PPO2

# multiprocess environment
env = make_vec_env('CartPole-v1', n_envs=4)

model = PPO2(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=25000)
model.save("ppo2_cartpole")

del model # remove to demonstrate saving and loading

model = PPO2.load("ppo2_cartpole")

# Enjoy trained agent
env = make_vec_env('Pendulum-v0', n_envs=1)       # <------ NEW LINE
obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

Miffyli commented 4 years ago

Yeah the code works for single environments but it should also work for multiple environments (draws a neat tiled image with cv2). I will look into this later today.

araffin commented 4 years ago

~@Miffyli i think i forgot to push that to sb2 : https://github.com/DLR-RM/stable-baselines3/pull/43~

EDIT: i did it here https://github.com/hill-a/stable-baselines/pull/880

araffin commented 4 years ago

@pstansell could you upgrade to master's version (cf doc) ?

Miffyli commented 4 years ago

@araffin Running with master it defaults to "human" as expected, but this error still rises:

~\Desktop\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py in render(self, mode)
     85             return self.envs[0].render(mode=mode)
     86         else:
---> 87             return super().render(mode=mode)
     88 
     89     def _save_obs(self, env_idx, obs):

~\Desktop\stable-baselines\stable_baselines\common\vec_env\base_vec_env.py in render(self, mode)
    169 
    170         # Create a big image by tiling images from subprocesses
--> 171         bigimg = tile_images(imgs)
    172         if mode == 'human':
    173             import cv2  # pytype:disable=import-error

~\Desktop\stable-baselines\stable_baselines\common\tile_images.py in tile_images(img_nhwc)
     13     """
     14     img_nhwc = np.asarray(img_nhwc)
---> 15     n_images, height, width, n_channels = img_nhwc.shape
     16     # new_height was named H before
     17     new_height = int(np.ceil(np.sqrt(n_images)))

ValueError: not enough values to unpack (expected 4, got 1)

araffin commented 4 years ago

@Miffyli what version are you using?

The following code works for me (using latest master version):

from stable_baselines.common.cmd_util import make_vec_env

n_envs = 4
env = make_vec_env('CartPole-v1', n_envs=n_envs)

obs = env.reset()
for _ in range(100):
    env.step([env.action_space.sample() for _ in range(n_envs)])
    env.render()

(it works also with SB3)

Miffyli commented 4 years ago

Ah yes, I did a derp. I forgot to reset the environment before trying to render it :'). Everything works as expected.

pstansell commented 4 years ago

Thanks for the amazingly quick resolution!