Closed denizs closed 7 years ago
You can't have picture without window open.
You don't need picture to learn classic envs.
To visualize learned policy, it works just fine.
Why this issue?
I know, that I can learn from the observations provided env.resest()
and env.step()
, however I'd like my agent to learn from pixel input and not from physical states as it is the case for classic envs.
You want continuous control from pixels? You best bet is Hopper (super easy task) for Roboschool. You can adjust camera angle in the code, to suit your needs.
Let me close this, nothing to fix.
Ok thanks :)
This feature would also be useful if you're assembling the frames to visualize performance in, say, a Jupyter notebook hosted on remote device.
I am also looking for the same feature. Hopper and car-racing are continuous control environments. It would be valuable to be able to experiment with the classic control envs in Gym with pixel observations and without having to open a pyglet window (which slows down training immensely and precludes training on remotely on a cluster).
Gym is a set of toy environments. These are used for testing and debugging code that will later be deployed on bigger problems.
Cartpole-v0 is the most basic control problem, a discrete action space, with very low dimensionality (4 features, 2 actions) and a nearly linear dynamics model. (I would guess the dynamics are linear in the 1st derivative).
This means your testing cycle on any classic control problem is going to be MUCH shorter than the other gym environments.
I've run both experiments and hopper is a more difficult RL problem than any of the classic control problems, by at least 1 or 2 orders of magnitude. CartPole will get solved much faster by the same algorithm in nearly every case.
Conversely, if your algorithm cannot solve cartpole, then you know it won't solve anything. Therefore you have a bug. For hopper, this is not as clear, as hyperparameters do come into play.
Getting results faster means faster development times.
That's why as many features as possible around classic control problems are helpful.
If you just want to test your algorithm in a simple environment with pixel observations. I would recommend using Pong-v0 or similar Atari envs based on the documentation by default it supports RGB screen image observations. Action space is discrete as well which might help with training or testing your algorithm.
I agree, Pong-v0 is my first choice for testing with pixel inputs.
Dynamics of Pong is pretty non-linear though, because you have collisions. Also the input distribution changes as you learn due to the opponent AI. Pong also requires a little bit of exploration, whereas cartpole doesn't require any.
All good!
To be honest, it's not hard at all to hack Pygame to give you pixel inputs, or write your own cartpole simulator for that matter.
Perhaps I will do that and contribute to the project instead of banging on about it on a github thread :)
On Thu, Feb 6, 2020 at 10:07 AM BarisYazici notifications@github.com wrote:
If you just want to test your algorithm in a simple environment with pixel observations. I would recommend using Pong-v0 https://gym.openai.com/envs/Pong-v0/ or similar Atari envs based on the documentation by default it supports RGB screen image observations. Action space is discrete as well which might help with training or testing your algorithm.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openai/gym/issues/659?email_source=notifications&email_token=ACUOYLDEIIQB5KXASRPGTB3RBRGWLA5CNFSM4DTFKBM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELAGTXI#issuecomment-583035357, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUOYLCGPIHYRIJUODGUHPTRBRGWLANCNFSM4DTFKBMQ .
-- Duane Broadcom 480 760 1559
Short Version:
Expected Behaviour
env.render(mode='rgb_array', close=True)
returns a numpy array containing the raw pixel representation of the current state.Actual Behaviour
The call returns
None
for classic controlenv
s leveragingpyglet
to render, e.g.CartPole-v0
,MountainCar-v0
How to reproduce:
Reason
The rgb values are extracted from the window
pyglet
renders to. Callingrender
withclose=True
, opening a window is omitted, causing the observation to beNone
.Consequences
All python-only
env
s rely on being executed on the main thread when learning from pixels (at least on OSX), as the os doesn't allow UI changes on sub processes. This makes them unsuitable for any parallel / asynchronous agents such as A3C.Question
Is there any possibility to make
render(mode='rgb_array', close=True)
work for ccenv
s by e.g. directly writing the changes to an image buffer, making it suitable for A3C and other agents relying on multiple subprocesses?Related Issues:
347
Slightly less short:
Hi all, I'm currently working on multiple RL agents, including an A3C implementation, which I'd like use on classic control (cc)
env
s.As those don't return their pixel representations as their observation, the agents retrieve the rgb values by calling
render(mode='rgb_array')
on theenv
instance.Unfortunately, this fails for cc
env
s when being called on a subprocess as required by the A3C architecture, while working fine with e.g.PongDeterministic-v3
and other Atari environments.Reviewing the implementation, I understand that all cc
env
s leverage pyglet to render to a window and then obtain the rgb values from it, causing python to crash in a parallel setting, as OSX doesn't allow UI operations on sub processes.After some research, I found #347 introducing the
close
argument ofrender()
. Unfortunately it returnsNone
for ccenv
s as those require a window to be open.Is there any possibility to make
render(mode='rgb_array', close=True)
work for ccenv
s by e.g. directly writing the changes to an image buffer, making it suitable for A3C and other agents relying on multiple subprocesses?I know that this is somewhat more a pyglet related issue, but in case someone can point me in the right direction, I'd be glad to put this into a PR. 🙂
Thanks in advance, Deniz
A minimal setup to reproduce this would something like: