denizs commented 7 years ago

Short Version:

Expected Behaviour

env.render(mode='rgb_array', close=True) returns a numpy array containing the raw pixel representation of the current state.

Actual Behaviour

The call returns None for classic control envs leveraging pyglet to render, e.g. CartPole-v0, MountainCar-v0

How to reproduce:

env = gym.make('CartPole-v0')
env.reset()
obs = env.render(mode='rgb_array', close=True)
print(obs is None)  # >>> True

Reason

The rgb values are extracted from the window pyglet renders to. Calling render with close=True, opening a window is omitted, causing the observation to be None.

Consequences

All python-only envs rely on being executed on the main thread when learning from pixels (at least on OSX), as the os doesn't allow UI changes on sub processes. This makes them unsuitable for any parallel / asynchronous agents such as A3C.

Question

Is there any possibility to make render(mode='rgb_array', close=True) work for cc envs by e.g. directly writing the changes to an image buffer, making it suitable for A3C and other agents relying on multiple subprocesses?

Related Issues:

347 Slightly less short:

Hi all, I'm currently working on multiple RL agents, including an A3C implementation, which I'd like use on classic control (cc) envs.

As those don't return their pixel representations as their observation, the agents retrieve the rgb values by calling render(mode='rgb_array') on the env instance.

Unfortunately, this fails for cc envs when being called on a subprocess as required by the A3C architecture, while working fine with e.g. PongDeterministic-v3 and other Atari environments.

Reviewing the implementation, I understand that all cc envs leverage pyglet to render to a window and then obtain the rgb values from it, causing python to crash in a parallel setting, as OSX doesn't allow UI operations on sub processes.

After some research, I found #347 introducing the close argument of render(). Unfortunately it returns None for cc envs as those require a window to be open.

Is there any possibility to make render(mode='rgb_array', close=True) work for cc envs by e.g. directly writing the changes to an image buffer, making it suitable for A3C and other agents relying on multiple subprocesses?

I know that this is somewhat more a pyglet related issue, but in case someone can point me in the right direction, I'd be glad to put this into a PR. 🙂

Thanks in advance, Deniz

A minimal setup to reproduce this would something like:

import gym
from multiprocessing import Process

class Worker(Process):
    def __init__(self, env_name, name=None, render_rgb=False):
        Process.__init__(self, name=name)
        self.env = gym.make(env_name)
        self.env.reset()
        self.render_rgb = render_rgb
        print('Environment initialized. {}'.format(self.name))

    def run(self, render_rgb=False):
        for _ in range(100):
            action = self.env.action_space.sample()
            obs, reward, done, _ = self.env.step(action)
            print(obs)
            if self.render_rgb:
                observation = self.env.render(mode='rgb_array')
            if done:
                print('Done with an epsidode for {}'.format(self.name))
                self.env.reset()

class Agent(object):

    def __init__(self, env_name, num_env, render_rgb=False):
        assert num_env > 0, 'Number of environments must be postive.'
        self.num_env = num_env
        self.workers = []

        for env_idx in range(num_env):
            env_worker = Worker(env_name, name=str(env_idx),
                                   render_rgb=render_rgb)
            self.workers.append(env_worker)

        for w in self.workers:
            w.start()

        for w in self.workers:
            w.join()

if __name__ == '__main__':

    first = Agent('MountainCar-v0', 4, render_rgb=False)  # works

    second = Agent('PongDeterministic-v3', 4, render_rgb=True)  # works

    third = Agent('MountainCar-v0', 4, render_rgb=True)  # will fail

olegklimov commented 7 years ago

You can't have picture without window open.

You don't need picture to learn classic envs.

To visualize learned policy, it works just fine.

Why this issue?

denizs commented 7 years ago

I know, that I can learn from the observations provided env.resest() and env.step(), however I'd like my agent to learn from pixel input and not from physical states as it is the case for classic envs.

olegklimov commented 7 years ago

You want continuous control from pixels? You best bet is Hopper (super easy task) for Roboschool. You can adjust camera angle in the code, to suit your needs.

Let me close this, nothing to fix.

denizs commented 7 years ago

Ok thanks :)

samhaaf commented 6 years ago

This feature would also be useful if you're assembling the frames to visualize performance in, say, a Jupyter notebook hosted on remote device.

andrecavalcante commented 6 years ago

This might help https://github.com/hardmaru/WorldModelsExperiments/blob/master/carracing/render_env.py

abagaria commented 4 years ago

I am also looking for the same feature. Hopper and car-racing are continuous control environments. It would be valuable to be able to experiment with the classic control envs in Gym with pixel observations and without having to open a pyglet window (which slows down training immensely and precludes training on remotely on a cluster).

DuaneNielsen commented 4 years ago

Gym is a set of toy environments. These are used for testing and debugging code that will later be deployed on bigger problems.

Cartpole-v0 is the most basic control problem, a discrete action space, with very low dimensionality (4 features, 2 actions) and a nearly linear dynamics model. (I would guess the dynamics are linear in the 1st derivative).

This means your testing cycle on any classic control problem is going to be MUCH shorter than the other gym environments.

I've run both experiments and hopper is a more difficult RL problem than any of the classic control problems, by at least 1 or 2 orders of magnitude. CartPole will get solved much faster by the same algorithm in nearly every case.

Conversely, if your algorithm cannot solve cartpole, then you know it won't solve anything. Therefore you have a bug. For hopper, this is not as clear, as hyperparameters do come into play.

Getting results faster means faster development times.

That's why as many features as possible around classic control problems are helpful.

BarisYazici commented 4 years ago

If you just want to test your algorithm in a simple environment with pixel observations. I would recommend using Pong-v0 or similar Atari envs based on the documentation by default it supports RGB screen image observations. Action space is discrete as well which might help with training or testing your algorithm.

DuaneNielsen commented 4 years ago

I agree, Pong-v0 is my first choice for testing with pixel inputs.

Dynamics of Pong is pretty non-linear though, because you have collisions. Also the input distribution changes as you learn due to the opponent AI. Pong also requires a little bit of exploration, whereas cartpole doesn't require any.

All good!

To be honest, it's not hard at all to hack Pygame to give you pixel inputs, or write your own cartpole simulator for that matter.

Perhaps I will do that and contribute to the project instead of banging on about it on a github thread :)

On Thu, Feb 6, 2020 at 10:07 AM BarisYazici notifications@github.com wrote:

If you just want to test your algorithm in a simple environment with pixel observations. I would recommend using Pong-v0 https://gym.openai.com/envs/Pong-v0/ or similar Atari envs based on the documentation by default it supports RGB screen image observations. Action space is discrete as well which might help with training or testing your algorithm.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openai/gym/issues/659?email_source=notifications&email_token=ACUOYLDEIIQB5KXASRPGTB3RBRGWLA5CNFSM4DTFKBM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELAGTXI#issuecomment-583035357, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUOYLCGPIHYRIJUODGUHPTRBRGWLANCNFSM4DTFKBMQ .

-- Duane Broadcom 480 760 1559

openai / gym

Classic control `render(mode='rgb_array', close=True)` returns `None` #659

Short Version:

Expected Behaviour

Actual Behaviour

How to reproduce:

Reason

Consequences

Question

Related Issues:

347

Slightly less short: