Closed lhorus closed 5 years ago
Hi,
First of all, thank you for this wonderful project, I can't stress it enough how badly baselines was in need of such a project.
Thanks =)
However, for some obscure reason, python never calls _init, for some obvious reason: even though it has >no arguments, it is still a function hence, please replace it with 'return _init()'.
The code is correct, make_env should return a function that will be used to instantiate the environment (the return _init
is correct).
In fact, SubProcVecEnv
instantiates the environment here: https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/vec_env/subproc_vec_env.py#L11 (for DummyVecEnv here: https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/vec_env/dummy_vec_env.py#L17)
Any ideas on how to fix this? I have implemented a simply Gym env, does it need to extend/implement SubprocVecEnv?
What happens if you use the code snippet from the example (with return _init_
)?
@araffin Thank you for your reply.
Well, if I use the snipped as is, nothing ever happens. In fact, if I add a small print() as a member of _init(): it doesn't even get called. As soon as the request to create the SubprocEnvMake environments, it just seems to freeze.
Hi,
Did you try without multiprocessing and with a DummyVecEnv
?
Does your custom env works with a random agent ?
Random agent script:
# instantiate your custom env
env = MyEnv()
obs = env.reset()
# do 1000 random actions
for i in range(1000):
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
if done:
obs = env.reset()
Random agents work fine, however, attempting to use SubprocVecEnv on a single instance of the environment yields an error:
raceback (most recent call last):
File "
File "D:\Programs\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)
File "D:\Programs\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/X/Desktop/thesis.py", line 183, in
File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 46, in init n_envs = len(env_fns)
TypeError: object of type 'BaselEnv' has no len()
That's normal, because you have to give a list of environment, not the env alone. You should first try with dummy vec env, cf getting started example: https://stable-baselines.readthedocs.io/en/master/guide/quickstart.html
Also, I would recommend you taking a look at "Using Custom Env" section from the documentation ;): https://stable-baselines.readthedocs.io/en/master/guide/custom_env.html
Thank you! I had missed those, didn't see them really.
Everything is according to the custom env guide, as for the DummyVecEnv, I hadn't tried it, and in fact, using it yields an error:
File "
", line 1, in runfile('C:/Users/X/Desktop/thesis.py', wdir='C:/Users/Horus/Desktop') File "D:\Programs\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)
File "D:\Programs\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Horus/Desktop/thesis.py", line 188, in
obs, reward, done, info = env.step(action) File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\base_vec_env.py", line 94, in step return self.step_wait()
File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\dummy_vec_env.py", line 47, in step_wait self.envs[env_idx].step(self.actions[env_idx])
TypeError: 'int' object is not subscriptable
Can you post the associated code?
Did you use it like that (see doc)? (It's hard to help you if I don't know what you ran...)
# Instantiate and wrap the env
env = DummyVecEnv([lambda: CustomEnv(arg1, ...)])
Absolutely, and you are right, my bad. The environment itself is pretty standard - I deleted the code within functions as well, its irrelevant:
import os
import math
import sys
from collections import defaultdict
sys.path.append(r'c:\windows\system32\gym')
import gym
from gym import error, spaces, utils
from gym.utils import seeding
import numpy as np
from scipy import stats
from sklearn.preprocessing import StandardScaler
class BaselEnv(gym.Env):
def __init__(self):
super(BaselEnv, self).__init__()
# state space
self.action_space = spaces.Discrete(3000)
#self.observation_space = spaces.Box(low=np.array([0, 0, 0]), high=np.array([250, 11, 7]), dtype=int)
self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([11, 250]), dtype=int)
self.viewer = None
self.steps_beyond_done = None
self.seed()
self.observation = self.reset()
def step(self, action):
assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
(...)
return self._get_obs(), reward, done, self._get_obs()
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
def reset(self):
(...)
return self._get_obs()
def _get_obs(self):
return np.array([self.currentECCount, self.currentTtob])
And the simple test wrapper:
testenv = gym.make(env_id)
env = DummyVecEnv([lambda: testenv])
obs = env.reset()
for i in range(1000):
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
if done:
obs = env.reset()
Ok, DummyVecEnv expects a list of actions (or array of actions), not an int, and reset() is handled automatically, you don't have to check that done is set to true. Btw, why did you specified dtype=int for the Box? it looks weird to me.
Code that works for me:
import os
import math
import sys
from collections import defaultdict
import gym
from gym import error, spaces, utils
from gym.utils import seeding
import numpy as np
from scipy import stats
from sklearn.preprocessing import StandardScaler
class BaselEnv(gym.Env):
def __init__(self):
super(BaselEnv, self).__init__()
# state space
self.action_space = spaces.Discrete(3000)
#self.observation_space = spaces.Box(low=np.array([0, 0, 0]), high=np.array([250, 11, 7]), dtype=int)
self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([11, 250]), dtype=np.float32)
self.viewer = None
self.steps_beyond_done = None
self.seed()
self.observation = self.reset()
def step(self, action):
assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
reward = 0
done = False
return self._get_obs(), reward, done, self._get_obs()
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
def reset(self):
return self._get_obs()
def _get_obs(self):
return np.array([1.0, 2.0])
from stable_baselines.common.vec_env import DummyVecEnv
env = DummyVecEnv([lambda: BaselEnv()])
obs = env.reset()
for i in range(1000):
action = env.action_space.sample()
obs, reward, done, info = env.step([action])
Ok, for DummyVecEnv expects a list of actions (or array of actions), not an int.
Sorry I'm not following, where exactly? The difference I spotted is you are constructing the environment object directly on the DummyVecEnv.
Btw, why did you specified dtype=int for the Box, it sounds weird to me. Sounds weird I know, its just that I have a discrete observation space - which is still quite large - and that way its easier to handle the observations, I guess I could use a discrete state and collapse the observation - but in response, the bounds of the box are indexes, my returned observation is an array of indices, hence the int.
Before:
obs, reward, done, info = env.step(action)
After:
obs, reward, done, info = env.step([action])
my returned observation is an array of indices,
What you described is called "MultiDiscrete" space, see https://github.com/openai/gym/blob/master/gym/spaces/multi_discrete.py
I had no idea such existed (MultiDiscrete). You are the most helpful person of all times. I'll give it a try and give some feedback afterwards so you can close the ticket - hopefully - and I wont waste more of your time.
On a side note, should I scale my rewards and observations? Given NN tend to learn better for smaller scales, or does baselines automatically do it for us? (for the actions and observations, I take it for rewards one has to do it manually)
You have what is called VecNormalize: https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize I would recommend normalizing at least observation, I did not see any real difference for rewards. Please take a look at http://joschu.net/docs/nuts-and-bolts.pdf before scaling your reward.
And be sure to save the running average (for obs and reward), otherwise, when loading the model, it will behave differently.
I will, I'll try it tonight or tomorrow and get back at you.
Btw did you really test the multi envs option via lambda on SubprocVecEnv([make_env(env_id, i) for i in range(numenvs)]) ? Cus make_env still doesn't work for me, doesn't call the environment constructor (_init())
make_env does not call the constructor, this will be SubprocVecEnv which will instantiate the env.
Indeed, sorry, poor choice of words. It only gets instantiated once though - even though the range is of index 8. One instance is created and then the whole code just stays there, nothing else happens.
I don't know what you tried, but the following code print n messages (where n is the number of parallel processes) and works fine.
import os
import math
import sys
from collections import defaultdict
import gym
from gym import error, spaces, utils
from gym.utils import seeding
import numpy as np
from scipy import stats
from sklearn.preprocessing import StandardScaler
class BaselEnv(gym.Env):
def __init__(self):
super(BaselEnv, self).__init__()
print("Env initialized")
# state space
self.action_space = spaces.Discrete(3000)
#self.observation_space = spaces.Box(low=np.array([0, 0, 0]), high=np.array([250, 11, 7]), dtype=int)
self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([11, 250]), dtype=np.float32)
self.viewer = None
self.steps_beyond_done = None
self.seed()
self.observation = self.reset()
def step(self, action):
assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
reward = 0
done = False
return self._get_obs(), reward, done, self._get_obs()
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
def reset(self):
return self._get_obs()
def _get_obs(self):
return np.array([1.0, 2.0])
from stable_baselines.common.vec_env import SubprocVecEnv
n_envs = 2
def make_env(seed):
def _init():
env = BaselEnv()
env.seed(seed)
return env
return _init
env = SubprocVecEnv([make_env(i) for i in range(n_envs)])
obs = env.reset()
for i in range(1000):
action = [env.action_space.sample() for _ in range(n_envs)]
obs, reward, done, info = env.step(action)
Can't quite thank you enough.
I tried running your code, this is becoming extremely frustrating so I assumed it was a coding problem.
Here is the only output I get whilst running a copy of your code:
runfile('C:/Users/X/Desktop/test0.py', wdir='C:/Users/X/Desktop') D:\Programs\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from
floatto
np.floatingis deprecated. In future, it will be treated as
np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters
It should print. What can be causing this? I tried forcing a baselines update via
pip install git+git://github.com/openai/baselines.git
Still the problem persists.
This project is not openai baselines (please the install instructions here: https://stable-baselines.readthedocs.io/en/master/guide/install.html) What is your tf version? As your are using windows, I cannot really help you further... The output on my machine:
Env initialized
Env initialized
It is rather strange, for some reason the loop itself isn't working on the SubprocVecEnv itself. Even running the loop outwards and appending to a list which is then fed onto SubprocVecEnv stalls the code on SubprocVecEnv call.
someList = list()
for i in range(n_envs):
someList.append(make_env(i))
print('Just a test')
env = SubprocVecEnv(someList) #Stalls here,
obs = env.reset()
And accordingly (using your code) prints:
Env initialized Env initialized Just a test
And then stops.
Oh and obviously, I had to use a slightly modified version of make_env (which is in accordance to this)
def make_env(seed):
env = BaselEnv()
env.seed(seed)
return env
My tf version is 1.10.0 .
make_env
should return a function, not a gym env. Please take a look at the example I sent you ;)
(This is also the case in the link you referenced, env = SubprocVecEnv([make_env])
not env = SubprocVecEnv([make_env()])
)
def make_env(seed):
def _init():
env = BaselEnv()
env.seed(seed)
return env
return _init
Also, the code should print:
Just a test
Env initialized
Env initialized
Well, I definitely think the problem comes from your env...
Jeez sorry, its like amateur hour over here, too distracted.
Anyway, I am using your env, so I must assume the problem is windows itself. With this said, I'll have no option but to test SubProcEnv on baselines to see if the problem persists, perhaps they can find the issue. If not, well, non-threaded envs it is!
Thanks for your help and patience, seriously.
Closing because apparently not related to stable baselines. Feel free to add comments if you found a solution
Sorry for insisting, but I just can't let go such an amazing library, I just noticed this, not sure if it helps:
env = DummyVecEnv([make_env])
works fine, env = DummyVecEnv([lambda: make_env])
throws an error:
File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\dummy_vec_env.py", line 19, in __init__
VecEnv.__init__(self, len(env_fns), env.observation_space, env.action_space)
AttributeError: 'function' object has no attribute 'observation_space'
Using your code however:
I don't know what you tried, but the following code print n messages (where n is the number of parallel processes) and works fine.
import os import math import sys from collections import defaultdict import gym from gym import error, spaces, utils from gym.utils import seeding import numpy as np from scipy import stats from sklearn.preprocessing import StandardScaler class BaselEnv(gym.Env): def __init__(self): super(BaselEnv, self).__init__() print("Env initialized") # state space self.action_space = spaces.Discrete(3000) #self.observation_space = spaces.Box(low=np.array([0, 0, 0]), high=np.array([250, 11, 7]), dtype=int) self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([11, 250]), dtype=np.float32) self.viewer = None self.steps_beyond_done = None self.seed() self.observation = self.reset() def step(self, action): assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action)) reward = 0 done = False return self._get_obs(), reward, done, self._get_obs() def seed(self, seed=None): self.np_random, seed = seeding.np_random(seed) return [seed] def reset(self): return self._get_obs() def _get_obs(self): return np.array([1.0, 2.0]) from stable_baselines.common.vec_env import SubprocVecEnv n_envs = 2 def make_env(seed): def _init(): env = BaselEnv() env.seed(seed) return env return _init env = SubprocVecEnv([make_env(i) for i in range(n_envs)]) obs = env.reset() for i in range(1000): action = [env.action_space.sample() for _ in range(n_envs)] obs, reward, done, info = env.step(action)
Yields this error:
env = SubprocVecEnv([make_env(i) for i in range(2)]) D:\Programs\Anaconda3\lib\site-packages\h5py__init.py:36: FutureWarning: Conversion of the second argument of issubdtype from
float
tonp.floating
is deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type
. from ._conv import register_converters as _register_converters D:\Programs\Anaconda3\lib\site-packages\h5py\init__.py:36: FutureWarning: Conversion of the second argument of issubdtype fromfloat
tonp.floating
is deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type
. from ._conv import register_converters as _register_converters Process Process-2: Process Process-1: Traceback (most recent call last): Traceback (most recent call last): File "D:\Programs\Anaconda3\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "D:\Programs\Anaconda3\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, *self._kwargs) File "D:\Programs\Anaconda3\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 11, in _worker env = env_fn_wrapper.var() File "D:\Programs\Anaconda3\lib\multiprocessing\process.py", line 93, in run self._target(self._args, **self._kwargs) TypeError: 'NoneType' object is not callable File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 11, in _worker env = env_fn_wrapper.var() TypeError: 'NoneType' object is not callable Traceback (most recent call last): File "D:\Programs\Anaconda3\lib\multiprocessing\connection.py", line 312, in _recv_bytes nread, err = ov.GetOverlappedResult(True) BrokenPipeError: [WinError 109] The pipe has been endedDuring handling of the above exception, another exception occurred:
Traceback (most recent call last): File "
", line 1, in File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 57, in init observation_space, action_space = self.remotes[0].recv() File "D:\Programs\Anaconda3\lib\multiprocessing\connection.py", line 250, in recv buf = self._recv_bytes() File "D:\Programs\Anaconda3\lib\multiprocessing\connection.py", line 321, in _recv_bytes raise EOFError EOFError
Oddly enough, I too am experiencing problems with SubprocVecEnv. Using:
def make_env(seed): def _init(): env = BaselEnv() env.seed(seed) return env return _init
with
env = DummyVecEnv([make_env(i) for i in range(2)])
Works fine. However, if SubprocVecEnv is used instead, it just hangs there forever, not even initializing the env once. I first assumed the problem was in my env, however, the same occurs with CartPole-v1. Any ideas on what may be wrong?
On a side note, I noticed that using my env learning algorithms stutter, meaning, random agents perform fine, but ACTKR for instance runs some steps, then stops, then runs some steps, and so on. Is this normal? Does the algorithm only update its weights after X steps?
Hello, To help you, i need your full config. Are you also on Windows ?
And yes, a2c and its derivatives (acktr is one of them), only update weights after n steps.
At least the stuttering is explained!
Windows as well, Windows 10. Numpy is 1.14.3. Gym is 0.10.5. Scipy is 1.1.0.
Ok, so I suspect Windows to be the problem... Subprocvecenv is automatically tested by the continuous integration (travis) and we never experienced that issue. I think i will add a note in the doc.
So, I recommend you to either use docker or change your OS... (I know that the last solution is a bit radical ;) )
Ahah sounds a tad radical indeed!
I don't have a place to run Docker, but I'l definitely try running all of this on Google Collab and report back to you! :)
Alright!
So I got the project up and running on Colab (if only I knew your tutorials were Colab compatible, oh well), and it is indeed a Windows problem, at least on Colab the problem seems to have vanished.
Small problem though: using CartPole-v1 works fine, using my env doesn't, as it freezes on the first update, and it hangs there forever. I tried testing it vs DummyVecEnv, and it seems that for some reason, using SubprocVecEnv doesn't even call step (I print a small message at the end of the step function). Any idea on what exactly might be causing this? Although DummyVecEnv does call step, I'm not sure if learning is occurring, as its a very slow env, and only one verbose is displayed (the initial one).
Would you mind terribly engaging in direct contact? Seems like this way its just a lot longer and doesn't truly benefit the community, as its a problem with my deadbeat brain. I compared it with Carpole's source and I can't spot any significant difference.
Hi, Yes, stable-baselines is compatible with colab ;) (and we provide example notebooks in the documentation)
it is indeed a Windows problem
Unfortunately, not a surprise for me... If you fix it, please let us know ;)
Would you mind terribly engaging in direct contact?
I'm sorry but I don't have time for personal debugging. Maintaining stable-baselines along the S-RL Toolbox repo is already a lot of additional work for me.
Yes I tested that code. With ACKTR and SubprocVecEnc or DummyVecEnv, "step" is called (I added a print to do a quick check): Tested on ubuntu (18.04), with master version of stable-baselines and python 3.6.
import os
import math
import sys
from collections import defaultdict
import gym
from gym import error, spaces, utils
from gym.utils import seeding
import numpy as np
from scipy import stats
from sklearn.preprocessing import StandardScaler
class BaselEnv(gym.Env):
def __init__(self):
super(BaselEnv, self).__init__()
# state space
self.action_space = spaces.Discrete(3000)
#self.observation_space = spaces.Box(low=np.array([0, 0, 0]), high=np.array([250, 11, 7]), dtype=int)
self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([11, 250]), dtype=np.float32)
self.viewer = None
self.steps_beyond_done = None
self.seed()
self.observation = self.reset()
def step(self, action):
assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
reward = 0
done = False
print("step")
return self._get_obs(), reward, done, self._get_obs()
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
def reset(self):
return self._get_obs()
def _get_obs(self):
return np.array([1.0, 2.0])
from stable_baselines.common.vec_env import DummyVecEnv, SubprocVecEnv
from stable_baselines import ACKTR
env = SubprocVecEnv([lambda: BaselEnv()])
ACKTR("MlpPolicy", env).learn(100)
# obs = env.reset()
# for i in range(1000):
# action = env.action_space.sample()
# obs, reward, done, info = env.step([action])
It prints? That's odd, check this JNotebook, printing is absolutely ignored within the env whilst learning, even the model's verbose 2 seem to not be displayed (only verbose level 1).
It prints but only if learn is in the same cell where the model is defined: https://colab.research.google.com/drive/1Hkv_nly2IAdeV-rcoSXkJrBsbGEFb-QX
I think that has something to do with how python notebook handle print from sub-processes (with DummyVecEnv, it prints everytime). I don't think that is an issue.
It prints but only if learn is in the same cell where the model is defined: https://colab.research.google.com/drive/1Hkv_nly2IAdeV-rcoSXkJrBsbGEFb-QX
I think that has something to do with how python notebook handle print from sub-processes (with DummyVecEnv, it prints everytime). I don't think that is an issue.
You're a genious. Everything seems to be in working order.
Learning is unbelieavably slow with 3k actions, but I guess thats normal. On a side note, the repo's readme mentions that MultiDiscrete isn't implemented for ACTKR and some other algorithms. I assume it refers to the action_space, and not to the observation_space?
Learning is unbelieavably slow with 3k actions, but I guess thats normal.
That's completely normal.
On a side note, the repo's readme mentions that MultiDiscrete isn't implemented for ACTKR and some >other algorithms. I assume it refers to the action_space, and not to the observation_space?
The support table is both for action and observation spaces (The support test is done here: https://github.com/hill-a/stable-baselines/blob/master/tests/test_action_space.py#L20).
The support table is both for action and observation spaces (The support test is done here: https://github.com/hill-a/stable-baselines/blob/master/tests/test_action_space.py#L20).
Important to know, as my ACKTR was using a MultiDiscrete observation space, will swap it then. Thank you.
First of all, thank you for this wonderful project, I can't stress it enough how badly baselines was in need of such a project.
Now, the Multiprocessing Tutorial created by stable-baselines (see) states that the following is to be used to generate multiple envs - as an example of course:
However, for some obscure reason, python never calls _init, for some obvious reason: even though it has no arguments, it is still a function hence, please replace it with 'return _init()'.
Secondly, even doing so results in an error when building the SubprocVecEnv([make_env(env_id, i) for i in range(numenvs)]), namely:
Any ideas on how to fix this? I have implemented a simply Gym env, does it need to extend/implement SubprocVecEnv?