hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.09k stars 727 forks source link

Multithreading broken pipeline on custom Env #40

Closed lhorus closed 5 years ago

lhorus commented 5 years ago

First of all, thank you for this wonderful project, I can't stress it enough how badly baselines was in need of such a project.

Now, the Multiprocessing Tutorial created by stable-baselines (see) states that the following is to be used to generate multiple envs - as an example of course:

def make_env(env_id, rank, seed=0):
    """
    Utility function for multiprocessed env.

    :param env_id: (str) the environment ID
    :param num_env: (int) the number of environment you wish to have in subprocesses
    :param seed: (int) the inital seed for RNG
    :param rank: (int) index of the subprocess
    """
    def _init():
        env = gym.make(env_id)
        env.seed(seed + rank)
        return env
    set_global_seeds(seed)
    return _init

However, for some obscure reason, python never calls _init, for some obvious reason: even though it has no arguments, it is still a function hence, please replace it with 'return _init()'.

Secondly, even doing so results in an error when building the SubprocVecEnv([make_env(env_id, i) for i in range(numenvs)]), namely:

Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/X/Desktop/thesis.py', wdir='C:/Users/X/Desktop')

File "D:\Programs\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "D:\Programs\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/X/Desktop/thesis.py", line 133, in env = SubprocVecEnv([make_env(env_id, i) for i in range(numenvs)])

File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 52, in init process.start()

File "D:\Programs\Anaconda3\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self)

File "D:\Programs\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj)

File "D:\Programs\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj)

File "D:\Programs\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init reduction.dump(process_obj, to_child)

File "D:\Programs\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

Any ideas on how to fix this? I have implemented a simply Gym env, does it need to extend/implement SubprocVecEnv?

araffin commented 5 years ago

Hi,

First of all, thank you for this wonderful project, I can't stress it enough how badly baselines was in need of such a project.

Thanks =)

However, for some obscure reason, python never calls _init, for some obvious reason: even though it has >no arguments, it is still a function hence, please replace it with 'return _init()'.

The code is correct, make_env should return a function that will be used to instantiate the environment (the return _init is correct). In fact, SubProcVecEnv instantiates the environment here: https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/vec_env/subproc_vec_env.py#L11 (for DummyVecEnv here: https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/vec_env/dummy_vec_env.py#L17)

Any ideas on how to fix this? I have implemented a simply Gym env, does it need to extend/implement SubprocVecEnv?

What happens if you use the code snippet from the example (with return _init_)?

lhorus commented 5 years ago

@araffin Thank you for your reply.

Well, if I use the snipped as is, nothing ever happens. In fact, if I add a small print() as a member of _init(): it doesn't even get called. As soon as the request to create the SubprocEnvMake environments, it just seems to freeze.

araffin commented 5 years ago

Hi, Did you try without multiprocessing and with a DummyVecEnv ?

Does your custom env works with a random agent ?

Random agent script:

# instantiate your custom env
env = MyEnv()
obs = env.reset()
# do 1000 random actions
for i in range(1000):
    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)
    if done:
        obs = env.reset()
lhorus commented 5 years ago

Random agents work fine, however, attempting to use SubprocVecEnv on a single instance of the environment yields an error:

raceback (most recent call last):

File "", line 1, in runfile('C:/Users/X/Desktop/thesis.py', wdir='C:/Users/Horus/Desktop')

File "D:\Programs\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "D:\Programs\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/X/Desktop/thesis.py", line 183, in env = SubprocVecEnv(testenv)

File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 46, in init n_envs = len(env_fns)

TypeError: object of type 'BaselEnv' has no len()

araffin commented 5 years ago

That's normal, because you have to give a list of environment, not the env alone. You should first try with dummy vec env, cf getting started example: https://stable-baselines.readthedocs.io/en/master/guide/quickstart.html

araffin commented 5 years ago

Also, I would recommend you taking a look at "Using Custom Env" section from the documentation ;): https://stable-baselines.readthedocs.io/en/master/guide/custom_env.html

lhorus commented 5 years ago

Thank you! I had missed those, didn't see them really.

Everything is according to the custom env guide, as for the DummyVecEnv, I hadn't tried it, and in fact, using it yields an error:

File "", line 1, in runfile('C:/Users/X/Desktop/thesis.py', wdir='C:/Users/Horus/Desktop')

File "D:\Programs\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "D:\Programs\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Horus/Desktop/thesis.py", line 188, in obs, reward, done, info = env.step(action)

File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\base_vec_env.py", line 94, in step return self.step_wait()

File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\dummy_vec_env.py", line 47, in step_wait self.envs[env_idx].step(self.actions[env_idx])

TypeError: 'int' object is not subscriptable

araffin commented 5 years ago

Can you post the associated code?

Did you use it like that (see doc)? (It's hard to help you if I don't know what you ran...)

# Instantiate and wrap the env
env = DummyVecEnv([lambda: CustomEnv(arg1, ...)])
lhorus commented 5 years ago

Absolutely, and you are right, my bad. The environment itself is pretty standard - I deleted the code within functions as well, its irrelevant:

import os
import math
import sys
from collections import defaultdict

sys.path.append(r'c:\windows\system32\gym')

import gym
from gym import error, spaces, utils

from gym.utils import seeding

import numpy as np
from scipy import stats
from sklearn.preprocessing import StandardScaler

class BaselEnv(gym.Env):

    def __init__(self):
        super(BaselEnv, self).__init__()

        # state space
        self.action_space = spaces.Discrete(3000)
        #self.observation_space = spaces.Box(low=np.array([0, 0, 0]), high=np.array([250, 11, 7]), dtype=int)
        self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([11, 250]), dtype=int)
        self.viewer = None
        self.steps_beyond_done = None
        self.seed()

        self.observation = self.reset()

    def step(self, action):
        assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
         (...)      
        return self._get_obs(), reward, done, self._get_obs()

    def seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def reset(self):
        (...)        
        return self._get_obs()

    def _get_obs(self):
        return np.array([self.currentECCount, self.currentTtob])

And the simple test wrapper:

 testenv = gym.make(env_id)

 env = DummyVecEnv([lambda: testenv])
 obs = env.reset()

 for i in range(1000):
     action = env.action_space.sample()
     obs, reward, done, info = env.step(action)
     if done:
         obs = env.reset()
araffin commented 5 years ago

Ok, DummyVecEnv expects a list of actions (or array of actions), not an int, and reset() is handled automatically, you don't have to check that done is set to true. Btw, why did you specified dtype=int for the Box? it looks weird to me.

Code that works for me:

import os
import math
import sys
from collections import defaultdict

import gym
from gym import error, spaces, utils

from gym.utils import seeding

import numpy as np
from scipy import stats
from sklearn.preprocessing import StandardScaler

class BaselEnv(gym.Env):

    def __init__(self):
        super(BaselEnv, self).__init__()

        # state space
        self.action_space = spaces.Discrete(3000)
        #self.observation_space = spaces.Box(low=np.array([0, 0, 0]), high=np.array([250, 11, 7]), dtype=int)
        self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([11, 250]), dtype=np.float32)
        self.viewer = None
        self.steps_beyond_done = None
        self.seed()

        self.observation = self.reset()

    def step(self, action):
        assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
        reward = 0
        done = False
        return self._get_obs(), reward, done, self._get_obs()

    def seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def reset(self):
        return self._get_obs()

    def _get_obs(self):
        return np.array([1.0, 2.0])

from stable_baselines.common.vec_env import DummyVecEnv

env = DummyVecEnv([lambda: BaselEnv()])
obs = env.reset()

for i in range(1000):
    action = env.action_space.sample()
    obs, reward, done, info = env.step([action])
lhorus commented 5 years ago

Ok, for DummyVecEnv expects a list of actions (or array of actions), not an int.

Sorry I'm not following, where exactly? The difference I spotted is you are constructing the environment object directly on the DummyVecEnv.

Btw, why did you specified dtype=int for the Box, it sounds weird to me. Sounds weird I know, its just that I have a discrete observation space - which is still quite large - and that way its easier to handle the observations, I guess I could use a discrete state and collapse the observation - but in response, the bounds of the box are indexes, my returned observation is an array of indices, hence the int.

araffin commented 5 years ago

Before:

obs, reward, done, info = env.step(action)

After:

obs, reward, done, info = env.step([action])

my returned observation is an array of indices,

What you described is called "MultiDiscrete" space, see https://github.com/openai/gym/blob/master/gym/spaces/multi_discrete.py

lhorus commented 5 years ago

I had no idea such existed (MultiDiscrete). You are the most helpful person of all times. I'll give it a try and give some feedback afterwards so you can close the ticket - hopefully - and I wont waste more of your time.

On a side note, should I scale my rewards and observations? Given NN tend to learn better for smaller scales, or does baselines automatically do it for us? (for the actions and observations, I take it for rewards one has to do it manually)

araffin commented 5 years ago

You have what is called VecNormalize: https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize I would recommend normalizing at least observation, I did not see any real difference for rewards. Please take a look at http://joschu.net/docs/nuts-and-bolts.pdf before scaling your reward.

And be sure to save the running average (for obs and reward), otherwise, when loading the model, it will behave differently.

lhorus commented 5 years ago

I will, I'll try it tonight or tomorrow and get back at you.

Btw did you really test the multi envs option via lambda on SubprocVecEnv([make_env(env_id, i) for i in range(numenvs)]) ? Cus make_env still doesn't work for me, doesn't call the environment constructor (_init())

araffin commented 5 years ago

make_env does not call the constructor, this will be SubprocVecEnv which will instantiate the env.

lhorus commented 5 years ago

Indeed, sorry, poor choice of words. It only gets instantiated once though - even though the range is of index 8. One instance is created and then the whole code just stays there, nothing else happens.

araffin commented 5 years ago

I don't know what you tried, but the following code print n messages (where n is the number of parallel processes) and works fine.

import os
import math
import sys
from collections import defaultdict

import gym
from gym import error, spaces, utils

from gym.utils import seeding

import numpy as np
from scipy import stats
from sklearn.preprocessing import StandardScaler

class BaselEnv(gym.Env):

    def __init__(self):
        super(BaselEnv, self).__init__()
        print("Env initialized")
        # state space
        self.action_space = spaces.Discrete(3000)
        #self.observation_space = spaces.Box(low=np.array([0, 0, 0]), high=np.array([250, 11, 7]), dtype=int)
        self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([11, 250]), dtype=np.float32)
        self.viewer = None
        self.steps_beyond_done = None
        self.seed()

        self.observation = self.reset()

    def step(self, action):
        assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
        reward = 0
        done = False
        return self._get_obs(), reward, done, self._get_obs()

    def seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def reset(self):
        return self._get_obs()

    def _get_obs(self):
        return np.array([1.0, 2.0])

from stable_baselines.common.vec_env import SubprocVecEnv

n_envs = 2

def make_env(seed):
    def _init():
        env = BaselEnv()
        env.seed(seed)
        return env
    return _init

env = SubprocVecEnv([make_env(i) for i in range(n_envs)])
obs = env.reset()

for i in range(1000):
    action = [env.action_space.sample() for _ in range(n_envs)]
    obs, reward, done, info = env.step(action)
lhorus commented 5 years ago

Can't quite thank you enough.

I tried running your code, this is becoming extremely frustrating so I assumed it was a coding problem.

Here is the only output I get whilst running a copy of your code: runfile('C:/Users/X/Desktop/test0.py', wdir='C:/Users/X/Desktop') D:\Programs\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype fromfloattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters

It should print. What can be causing this? I tried forcing a baselines update via

pip install git+git://github.com/openai/baselines.git

Still the problem persists.

araffin commented 5 years ago

This project is not openai baselines (please the install instructions here: https://stable-baselines.readthedocs.io/en/master/guide/install.html) What is your tf version? As your are using windows, I cannot really help you further... The output on my machine:

Env initialized
Env initialized
lhorus commented 5 years ago

It is rather strange, for some reason the loop itself isn't working on the SubprocVecEnv itself. Even running the loop outwards and appending to a list which is then fed onto SubprocVecEnv stalls the code on SubprocVecEnv call.

someList = list()

for i in range(n_envs):  
    someList.append(make_env(i))
print('Just a test')
env = SubprocVecEnv(someList) #Stalls here,
obs = env.reset()

And accordingly (using your code) prints:

Env initialized Env initialized Just a test

And then stops.

Oh and obviously, I had to use a slightly modified version of make_env (which is in accordance to this)

def make_env(seed):
    env = BaselEnv()
    env.seed(seed)
    return env

My tf version is 1.10.0 .

araffin commented 5 years ago

make_env should return a function, not a gym env. Please take a look at the example I sent you ;) (This is also the case in the link you referenced, env = SubprocVecEnv([make_env]) not env = SubprocVecEnv([make_env()]))


def make_env(seed):
    def _init():
        env = BaselEnv()
        env.seed(seed)
        return env
    return _init

Also, the code should print:

Just a test
Env initialized
Env initialized

Well, I definitely think the problem comes from your env...

lhorus commented 5 years ago

Jeez sorry, its like amateur hour over here, too distracted.

Anyway, I am using your env, so I must assume the problem is windows itself. With this said, I'll have no option but to test SubProcEnv on baselines to see if the problem persists, perhaps they can find the issue. If not, well, non-threaded envs it is!

Thanks for your help and patience, seriously.

araffin commented 5 years ago

Closing because apparently not related to stable baselines. Feel free to add comments if you found a solution

lhorus commented 5 years ago

Sorry for insisting, but I just can't let go such an amazing library, I just noticed this, not sure if it helps:

env = DummyVecEnv([make_env])works fine, env = DummyVecEnv([lambda: make_env]) throws an error:

  File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\dummy_vec_env.py", line 19, in __init__
    VecEnv.__init__(self, len(env_fns), env.observation_space, env.action_space)
AttributeError: 'function' object has no attribute 'observation_space'

Using your code however:

I don't know what you tried, but the following code print n messages (where n is the number of parallel processes) and works fine.

import os
import math
import sys
from collections import defaultdict

import gym
from gym import error, spaces, utils

from gym.utils import seeding

import numpy as np
from scipy import stats
from sklearn.preprocessing import StandardScaler

class BaselEnv(gym.Env):

    def __init__(self):
        super(BaselEnv, self).__init__()
        print("Env initialized")
        # state space
        self.action_space = spaces.Discrete(3000)
        #self.observation_space = spaces.Box(low=np.array([0, 0, 0]), high=np.array([250, 11, 7]), dtype=int)
        self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([11, 250]), dtype=np.float32)
        self.viewer = None
        self.steps_beyond_done = None
        self.seed()

        self.observation = self.reset()

    def step(self, action):
        assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
        reward = 0
        done = False
        return self._get_obs(), reward, done, self._get_obs()

    def seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def reset(self):
        return self._get_obs()

    def _get_obs(self):
        return np.array([1.0, 2.0])

from stable_baselines.common.vec_env import SubprocVecEnv

n_envs = 2

def make_env(seed):
    def _init():
        env = BaselEnv()
        env.seed(seed)
        return env
    return _init

env = SubprocVecEnv([make_env(i) for i in range(n_envs)])
obs = env.reset()

for i in range(1000):
    action = [env.action_space.sample() for _ in range(n_envs)]
    obs, reward, done, info = env.step(action)

Yields this error:

env = SubprocVecEnv([make_env(i) for i in range(2)]) D:\Programs\Anaconda3\lib\site-packages\h5py__init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters D:\Programs\Anaconda3\lib\site-packages\h5py\init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Process Process-2: Process Process-1: Traceback (most recent call last): Traceback (most recent call last): File "D:\Programs\Anaconda3\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "D:\Programs\Anaconda3\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, *self._kwargs) File "D:\Programs\Anaconda3\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 11, in _worker env = env_fn_wrapper.var() File "D:\Programs\Anaconda3\lib\multiprocessing\process.py", line 93, in run self._target(self._args, **self._kwargs) TypeError: 'NoneType' object is not callable File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 11, in _worker env = env_fn_wrapper.var() TypeError: 'NoneType' object is not callable Traceback (most recent call last): File "D:\Programs\Anaconda3\lib\multiprocessing\connection.py", line 312, in _recv_bytes nread, err = ov.GetOverlappedResult(True) BrokenPipeError: [WinError 109] The pipe has been ended

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "D:\Programs\Anaconda3\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 57, in init observation_space, action_space = self.remotes[0].recv() File "D:\Programs\Anaconda3\lib\multiprocessing\connection.py", line 250, in recv buf = self._recv_bytes() File "D:\Programs\Anaconda3\lib\multiprocessing\connection.py", line 321, in _recv_bytes raise EOFError EOFError

Guilherme-B commented 5 years ago

Oddly enough, I too am experiencing problems with SubprocVecEnv. Using:

def make_env(seed): def _init(): env = BaselEnv() env.seed(seed) return env return _init

with env = DummyVecEnv([make_env(i) for i in range(2)])

Works fine. However, if SubprocVecEnv is used instead, it just hangs there forever, not even initializing the env once. I first assumed the problem was in my env, however, the same occurs with CartPole-v1. Any ideas on what may be wrong?

On a side note, I noticed that using my env learning algorithms stutter, meaning, random agents perform fine, but ACTKR for instance runs some steps, then stops, then runs some steps, and so on. Is this normal? Does the algorithm only update its weights after X steps?

araffin commented 5 years ago

Hello, To help you, i need your full config. Are you also on Windows ?

And yes, a2c and its derivatives (acktr is one of them), only update weights after n steps.

Guilherme-B commented 5 years ago

At least the stuttering is explained!

Windows as well, Windows 10. Numpy is 1.14.3. Gym is 0.10.5. Scipy is 1.1.0.

araffin commented 5 years ago

Ok, so I suspect Windows to be the problem... Subprocvecenv is automatically tested by the continuous integration (travis) and we never experienced that issue. I think i will add a note in the doc.

So, I recommend you to either use docker or change your OS... (I know that the last solution is a bit radical ;) )

Guilherme-B commented 5 years ago

Ahah sounds a tad radical indeed!

I don't have a place to run Docker, but I'l definitely try running all of this on Google Collab and report back to you! :)

Guilherme-B commented 5 years ago

Alright!

So I got the project up and running on Colab (if only I knew your tutorials were Colab compatible, oh well), and it is indeed a Windows problem, at least on Colab the problem seems to have vanished.

Small problem though: using CartPole-v1 works fine, using my env doesn't, as it freezes on the first update, and it hangs there forever. I tried testing it vs DummyVecEnv, and it seems that for some reason, using SubprocVecEnv doesn't even call step (I print a small message at the end of the step function). Any idea on what exactly might be causing this? Although DummyVecEnv does call step, I'm not sure if learning is occurring, as its a very slow env, and only one verbose is displayed (the initial one).

Would you mind terribly engaging in direct contact? Seems like this way its just a lot longer and doesn't truly benefit the community, as its a problem with my deadbeat brain. I compared it with Carpole's source and I can't spot any significant difference.

araffin commented 5 years ago

Hi, Yes, stable-baselines is compatible with colab ;) (and we provide example notebooks in the documentation)

it is indeed a Windows problem

Unfortunately, not a surprise for me... If you fix it, please let us know ;)

Would you mind terribly engaging in direct contact?

I'm sorry but I don't have time for personal debugging. Maintaining stable-baselines along the S-RL Toolbox repo is already a lot of additional work for me.

araffin commented 5 years ago

Yes I tested that code. With ACKTR and SubprocVecEnc or DummyVecEnv, "step" is called (I added a print to do a quick check): Tested on ubuntu (18.04), with master version of stable-baselines and python 3.6.

import os
import math
import sys
from collections import defaultdict

import gym
from gym import error, spaces, utils

from gym.utils import seeding

import numpy as np
from scipy import stats
from sklearn.preprocessing import StandardScaler

class BaselEnv(gym.Env):

    def __init__(self):
        super(BaselEnv, self).__init__()

        # state space
        self.action_space = spaces.Discrete(3000)
        #self.observation_space = spaces.Box(low=np.array([0, 0, 0]), high=np.array([250, 11, 7]), dtype=int)
        self.observation_space = spaces.Box(low=np.array([0, 0]), high=np.array([11, 250]), dtype=np.float32)
        self.viewer = None
        self.steps_beyond_done = None
        self.seed()

        self.observation = self.reset()

    def step(self, action):
        assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
        reward = 0
        done = False
        print("step")
        return self._get_obs(), reward, done, self._get_obs()

    def seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]

    def reset(self):
        return self._get_obs()

    def _get_obs(self):
        return np.array([1.0, 2.0])

from stable_baselines.common.vec_env import DummyVecEnv, SubprocVecEnv
from stable_baselines import ACKTR

env = SubprocVecEnv([lambda: BaselEnv()])

ACKTR("MlpPolicy", env).learn(100)

# obs = env.reset()

# for i in range(1000):
#     action = env.action_space.sample()
#     obs, reward, done, info = env.step([action])
Guilherme-B commented 5 years ago

It prints? That's odd, check this JNotebook, printing is absolutely ignored within the env whilst learning, even the model's verbose 2 seem to not be displayed (only verbose level 1).

Colab Jupyter Notebook here.

araffin commented 5 years ago

It prints but only if learn is in the same cell where the model is defined: https://colab.research.google.com/drive/1Hkv_nly2IAdeV-rcoSXkJrBsbGEFb-QX

I think that has something to do with how python notebook handle print from sub-processes (with DummyVecEnv, it prints everytime). I don't think that is an issue.

Guilherme-B commented 5 years ago

It prints but only if learn is in the same cell where the model is defined: https://colab.research.google.com/drive/1Hkv_nly2IAdeV-rcoSXkJrBsbGEFb-QX

I think that has something to do with how python notebook handle print from sub-processes (with DummyVecEnv, it prints everytime). I don't think that is an issue.

You're a genious. Everything seems to be in working order.

Learning is unbelieavably slow with 3k actions, but I guess thats normal. On a side note, the repo's readme mentions that MultiDiscrete isn't implemented for ACTKR and some other algorithms. I assume it refers to the action_space, and not to the observation_space?

araffin commented 5 years ago

Learning is unbelieavably slow with 3k actions, but I guess thats normal.

That's completely normal.

On a side note, the repo's readme mentions that MultiDiscrete isn't implemented for ACTKR and some >other algorithms. I assume it refers to the action_space, and not to the observation_space?

The support table is both for action and observation spaces (The support test is done here: https://github.com/hill-a/stable-baselines/blob/master/tests/test_action_space.py#L20).

Guilherme-B commented 5 years ago

The support table is both for action and observation spaces (The support test is done here: https://github.com/hill-a/stable-baselines/blob/master/tests/test_action_space.py#L20).

Important to know, as my ACKTR was using a MultiDiscrete observation space, will swap it then. Thank you.