rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.88k stars 310 forks source link

Custom environment #1345

Closed sdfzz closed 4 years ago

sdfzz commented 4 years ago

Hi all,

I'm trying to use garage for my RL project. I've installed garage, and I can run examples such as 'trpo_cartpole.py'. However, I'm having difficulties creating my own environment.

It seems that two examples on how to create environment are provided:

  1. garage/docs/user/implement_env.rst (https://github.com/rlworkgroup/garage/blob/master/docs/user/implement_env.rst) Here I'm stuck at "from garage.envs.base import Env" where I got this error:

    ImportError: cannot import name 'Env'

  2. garage/examples/jupyter/custom_env.ipynb (https://github.com/rlworkgroup/garage/blob/master/examples/jupyter/custom_env.ipynb) I've copy-and-pasted code, but I'm getting the following error:

Traceback (most recent call last): File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/local_runner.py", line 279, in step_epochs self._start_worker() File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/local_runner.py", line 100, in _start_worker self.sampler.start_worker() File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/sampler/on_policy_vectorized_sampler.py", line 39, in startworker envs = [pickle.loads(pickle.dumps(self.env)) for in range(n_envs)] File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/sampler/on_policy_vectorizedsampler.py", line 39, in envs = [pickle.loads(pickle.dumps(self.env)) for in range(n_envs)] _pickle.PicklingError: Can't pickle <class 'main.MyEnv'>: attribute lookup MyEnv on main failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/experiment_wrapper.py", line 251, in run_experiment(sys.argv) File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/experiment_wrapper.py", line 166, in run_experiment args.resume_from_epoch) File "p2_custom_env.py", line 98, in train runner.train(n_epochs=1, batch_size=10) File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/local_runner.py", line 255, in train return self.algo.train(self) File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/tf/algos/batch_polopt.py", line 92, in train for epoch in runner.step_epochs(): File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/local_runner.py", line 300, in step_epochs self._shutdown_worker() File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/local_runner.py", line 108, in _shutdown_worker self.sampler.shutdown_worker() File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/sampler/on_policy_vectorized_sampler.py", line 50, in shutdown_worker self.vecenv.close() AttributeError: 'NoneType' object has no attribute 'close'

It seems to me that both examples are quiet outdated. Can anyone please help me on how to create a custom environment?

Any help would be appreciated

Regards,

Steve

avnishn commented 4 years ago

Hi Steve,

Thanks for using garage! We are in the process of overhauling our examples and docs and plan to have them updated by our June release.

To help you with your problem: 1) create a custom gym environment by using the openai gym api. this is covered in https://github.com/rlworkgroup/garage/blob/master/examples/jupyter/custom_env.ipynb under the section custom gym environment

2) wrap this custom environment using garage.envs.GarageEnv . e.g env = GarageEnv(steve_custom_env)

It looks to me like that you need to do step 2. This is because gym envs are not pickleable by default, however the wrapper GarageEnv makes them pickleable. After step 2 you should be able to use your environment with garage algorithms!

sdfzz commented 4 years ago

Avnishn, thank you so much for quick reply

I agree with you that I've missed step 2.

Unfortunately I won't be able to test it right now, but I will give it a try as soon as I can...

Best Regards,

Steve

avnishn commented 4 years ago

@sdfzz I am going to close this issue for now. Please feel free to reopen if there are any more issues with regards to your original inquiry.

sdfzz commented 4 years ago

Hi,

I've tested garage.envs.GarageEnv, but I'm still getting AttributeError: 'NoneType' object has no attribute 'close' error...

I'm attaching my source code below...can you please check if you can reproduce my error? I'm running the code under Ubuntu 16.04, python 3.5.2 and Tensorflow 1.15.2.

One workaround I've found is to create a custom gym environment and import it. Then, I can use my own environment just like a gym environment.

Best Regards,

Steve

import gym
from gym import spaces
import numpy as np
import random
import tensorflow as tf
from garage.envs import normalize
from garage.experiment.deterministic import set_seed
from garage.tf.envs import TfEnv
from garage.tf.experiment import LocalTFRunner
from garage.envs.base import Step
from garage.envs.env_spec import EnvSpec
from garage.np.baselines import LinearFeatureBaseline 
from garage.experiment import run_experiment
from garage.tf.algos import TRPO
from garage.tf.envs import TfEnv
from garage.tf.policies import CategoricalMLPPolicy
from gym.envs.registration import register
from garage.envs import GarageEnv

class MyEnv(gym.Env):
  def __init__(self):
    self.action_space = spaces.Discrete(2)
    self.observation_space = spaces.Discrete(2)
    self.reset()

  def step(self, action):
    assert self.action_space.contains(action), "action not in action space"

    self.state = np.random.rand() < 0.5

    reward = (action == self.state)
    self.score += reward

    return self.state, reward, False, {}

  def reset(self):
    self.score = 0
    return 0

register(
  id='MyEnv-v0',
  entry_point=MyEnv,)

env = gym.make("MyEnv-v0")

class NpWrapper(gym.ObservationWrapper):
  def observation(self, observation):
    obs = np.array(observation).astype('int')
    return obs

env = NpWrapper(env)
env = TfEnv(normalize(env))
env = GarageEnv(env)  # wrapping custom env using garage.envs.GarageEnv

def train(snapshot_config, *_):
  with LocalTFRunner(snapshot_config=snapshot_config) as runner:
    policy = CategoricalMLPPolicy(
               name="policy",
               env_spec=env.spec,
               hidden_sizes=(32,32))
    baseline=LinearFeatureBaseline(env_spec=env.spec) 
    algo = TRPO(
             env_spec=env.spec,
             policy=policy,
             baseline=baseline,
             max_path_length=50,
             discount=0.99,
             max_kl_step=0.01)
    runner.setup(algo, env)

    runner.train(n_epochs=120, batch_size=2048, plot=False)

run_experiment(train, snapshot_mode="last", seed=1,)

Error message:

Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/experiment_wrapper.py", line 251, in <module>
    run_experiment(sys.argv)
  File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/experiment_wrapper.py", line 166, in run_experiment
    args.resume_from_epoch)
  File "p0_model_test.py", line 80, in train
    runner.train(n_epochs=120, batch_size=2048, plot=False)
  File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/local_runner.py", line 255, in train
    return self.algo.train(self)
  File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/tf/algos/batch_polopt.py", line 92, in train
    for epoch in runner.step_epochs():
  File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/local_runner.py", line 300, in step_epochs
    self._shutdown_worker()
  File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/experiment/local_runner.py", line 108, in _shutdown_worker
    self.sampler.shutdown_worker()
  File "/home/ubuntu/TF1/lib/python3.5/site-packages/garage/sampler/on_policy_vectorized_sampler.py", line 50, in shutdown_worker
    self.vec_env.close()
AttributeError: 'NoneType' object has no attribute 'close'
ryanjulian commented 4 years ago

Note that the gym API requires you implement close() . Because you did not call the super constructor in your custom environment, you didn't inherit the close() implementation from gym.Env.

Please see the gym.Env API here and ensure that you implement all parts of the API contract, to avoid future errors.

The environments should only be closed at the end of training, so from this log it appears likely to me that your system is working fine, and you are just missing the close API.

sereysethy commented 4 years ago

Hi Ryan,

I tried the code and I added close() to my custom env, but it still failed. I think it couldn't do the serialisation by calling NpWrapper, not sure why is it needed?

It is possible not to use register the env and pass it directly to GarageEnv?

What is the correct order of defining a new custom env?

import gym
from gym import spaces
import numpy as np
import random
import tensorflow as tf
from garage.envs import normalize
from garage.experiment.deterministic import set_seed
from garage.tf.envs import TfEnv
from garage.tf.experiment import LocalTFRunner
from garage.envs.base import Step
from garage.envs.env_spec import EnvSpec
from garage.np.baselines import LinearFeatureBaseline 
from garage.experiment import run_experiment
from garage.tf.algos import TRPO
from garage.tf.envs import TfEnv
from garage.tf.policies import CategoricalMLPPolicy
from gym.envs.registration import register
from garage.envs import GarageEnv

class MyEnv(gym.Env):
    def __init__(self):
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Discrete(2)
        self.reset()

    def step(self, action):
        assert self.action_space.contains(action), "action not in action space"

        self.state = np.random.rand() < 0.5

        reward = (action == self.state)
        self.score += reward

        return self.state, reward, False, {}

    def reset(self):
        self.score = 0
        return 0

    def close(self):
        return True

register(
    id='MyEnv-v0',
    entry_point=MyEnv,)

env = gym.make("MyEnv-v0")

class NpWrapper(gym.ObservationWrapper):
    def observation(self, observation):
        obs = np.array(observation).astype('int')
        return obs

env = NpWrapper(env)
env = TfEnv(normalize(env))
env = GarageEnv(env)  # wrapping custom env using garage.envs.GarageEnv

def train(snapshot_config, *_):
    with LocalTFRunner(snapshot_config=snapshot_config) as runner:
        policy = CategoricalMLPPolicy(
                name="policy",
                env_spec=env.spec,
                hidden_sizes=(32,32))
        baseline=LinearFeatureBaseline(env_spec=env.spec) 
        algo = TRPO(
                env_spec=env.spec,
                policy=policy,
                baseline=baseline,
                max_path_length=50,
                discount=0.99,
                max_kl_step=0.01)
        runner.setup(algo, env)

        runner.train(n_epochs=1, batch_size=2048, plot=False)

run_experiment(train, snapshot_mode="last", seed=1,)

Output

WARNING:tensorflow:From /redacted/opt/anaconda3/lib/python3.7/site-packages/akro/discrete.py:113: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /redacted/opt/anaconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Traceback (most recent call last):
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/experiment/local_runner.py", line 279, in step_epochs
    self._start_worker()
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/experiment/local_runner.py", line 100, in _start_worker
    self.sampler.start_worker()
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/sampler/on_policy_vectorized_sampler.py", line 39, in start_worker
    envs = [pickle.loads(pickle.dumps(self.env)) for _ in range(n_envs)]
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/sampler/on_policy_vectorized_sampler.py", line 39, in <listcomp>
    envs = [pickle.loads(pickle.dumps(self.env)) for _ in range(n_envs)]
_pickle.PicklingError: Can't pickle <class '__main__.NpWrapper'>: attribute lookup NpWrapper on __main__ failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/redacted/opt/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/redacted/opt/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/experiment/experiment_wrapper.py", line 251, in <module>
    run_experiment(sys.argv)
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/experiment/experiment_wrapper.py", line 166, in run_experiment
    args.resume_from_epoch)
  File "src/cowrie/smartproxy/learning/test.py", line 74, in train
    runner.train(n_epochs=1, batch_size=2048, plot=False)
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/experiment/local_runner.py", line 255, in train
    return self.algo.train(self)
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/tf/algos/batch_polopt.py", line 92, in train
    for epoch in runner.step_epochs():
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/experiment/local_runner.py", line 300, in step_epochs
    self._shutdown_worker()
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/experiment/local_runner.py", line 108, in _shutdown_worker
    self.sampler.shutdown_worker()
  File "/redacted/opt/anaconda3/lib/python3.7/site-packages/garage/sampler/on_policy_vectorized_sampler.py", line 50, in shutdown_worker
    self.vec_env.close()
AttributeError: 'NoneType' object has no attribute 'close'
ryanjulian commented 4 years ago

In order for garage to provide features such as snapshotting and high-performance parallel sampling, we require that all environments and algorithms be pickleable.

All Python objects implement the pickle protocol, but sometimes it's necessary to provide an explicit implementation (via __getstate__ and __setstate__) when an object has non-pickleable members.

In your case, I think you're running afoul of a quirk of Python, which is that it's difficult to pickle classes defined in __main__ (the entrypoint script), in your case NpWrapper. You can likely solve this problem by simply placing NpWrapper and MyEnv inside their own module (.py file) and importing it from your script.

Another option is to switch to RaySampler or MultiprocessingSampler, our more recent sampler implementations which use cloudpickle to get around this quirk of Python. You do this by importing a sampler class and passing it to runner.setup, as in this example (make sure to replace BatchSampler with MultiprocessingSampler).

We refrained from fixing this in OnPolicyVectorizedSampler because we plan on deleting them soon, in favor of this new sampling API. However, I've opened an issue to fix this, since we'll still include these older samplers in the forthcoming release.

sereysethy commented 4 years ago

I put them together in a module, it seems to solve the problem, at least it started training.

My next questions (I am not sure should I add a new issue?):

Here is what I want to do: I want to build a custom environment and use a simple algorithm, exactly like LinearFeatureBaseline, but I want to build my own baseline, just to see how it gives as a result. My observation is text + some other numbers, but I can vectorise it first. My objective is to build something that is easy enough, so that I can understand how Garage works before diving deep into the other algorithm. I tried to read the code, but I think without your help, I am lost.

ryanjulian commented 4 years ago
  1. No, no need to register your gym environment. As long as your environment implements the interface, it can be used
  2. LocalTFRunner is a subclass of LocalRunner which adds some setup needed for TensorFlow. It should be used with TensorFlow-based algorithms, whereas PyTorch and NumPy-based algorithms may use LocalRunner. Their interfaces are identical.
  3. abc is the abstract base classes module from the Python standard library. You can read up on them here: https://pymotw.com/3/abc/

Much of the code in garage may be hard to understand without a background in the core concepts of reinforcement learning. For a hands-on introduction to RL, I highly recommend Josh Achiam's Spinning Up tutorial.

For instance, a baseline (also called a value function) is a component used by an algorithm, rather than an algorithm itself. Defining a custom baseline for a problem can help RL algorithms solve problems faster, at the expense of making those solutions less general. Writing a custom baseline is a fine-enough starter project, though I would caution that the code for baselines in the TensorFlow tree is a bit over-engineered at the moment.

sereysethy commented 4 years ago

Thank you for your reply.

In fact I understand the core concept of RL, but I have never implemented its algorithms. That is why I am looking for libraries that already implement some of those algorithms, which in turn will help me to speed up my work. Thanks for the link by the way.

So what I want to do is to understand how Garage works so that I can integrate it to my research, and eventually customise it to fit my research. I hope the documentation that will be released in June can shed more light on Garage architecture so that people like myself can bootstrap and contribute to the development of this library.

Supposed that I want to write my own algorithm, should I subclass RLAlgorithm? What are the steps to do it?

ryanjulian commented 4 years ago

@sereysethy It's great to hear that you've already covered the core concepts! I apologize that my previous comment may come off as dismissive. I just don't want people getting lost here when there are better learning resources.

In contrast to many other repos of RL algorithms, garage is designed to be imported as a library, so that you can use its components and algorithms without editing the garage repository directly. There are not many examples of this usage in the wild, but I encourage you to try using it this way.

The inheritance hierarchy and API types in garage are a bit of mess right now (e.g. there are several Policy interfaces, for instance, though they are mostly mutually-compatible), which is another major work item for this summer.

As for the "documentation release" in June, a word of caution: There is no private library of unreleased documentation, which we are polishing in preparation for June. We (the maintainers) are just hoping to be able to devote some time to design, getting started, and tutorial documentation this summer. Keep in mind we are all volunteers whose main job is not garage, so timelines often become extended and progress comes in fits and starts.

The documentation we do have is docstrings throughout the codebase, documenting the APIs of each object and function. These almost always include type information. Your next best resource for examples, other than examples/ and the package source itself, is the test suite in tests/, which contains a usage example for every function and object in garage.

We would love developers like yourself to bootstrap and start contributing, and there's no need to learn all of RL algorithm development to start doing that! For instance, if you are interested, we would love to see pull requests adding high-level documentation (i.e. how do I add an algorithm?) like you're describing.

As for how to get started with a new algorithm, your best bet is to start with an existing one as a template. While it is true that garage.np.algos.RLAlgorithm defines the interface for all RL algorithms, that interface is only one function and doesn't give you much guidance on how to actually structure your algorithm's code. You should start by identifying which kind of RL algorithm you're developing (broadly, off-policy or on-policy), whether you need support for continuous action spaces (only some algorithms can target continuous actions, and this add some significant complexity), and which framework you'd like to use (NumPy, TensorFlow, or PyTorch). For frameworks, our support and examples are most comprehensive in TensorFlow (v1), as that codebase is most mature. Many tasks are easier in PyTorch, but we have fewer ready-made components and algorithms in that tree. When a component or algorithm is missing from PyTorch, it is usually fairly straightforward to port it to TensorFlow. You can use the numpy tree for algorithms which don't need neural network support, such as black-box optimization or evolutionary strategies.

Please keep us posted on your progress and feel free to open more issues (and pull requests!) for assitance!

sereysethy commented 4 years ago

Once again thank you for your reply. No worries, I really appreciate what you told me. I came across the resource that you pointed out before but I haven’t had a chance to go through it. I followed the online course of David Silver and R. Sutton’s book. As I said before my primary research is not about RL but I am very much interested in RL. The resource that you told me will give me a push into understanding how RL algorithms are implemented.

I don’t plan to modify Garage library but I plan to use it to do my research. I want to start with valued base methods using function approximation, like I told you about Linear function as a baseline to see if something that I want to do makes sense or not before I will use a more complex function etc. That is the first step and the second step will be to test PG algorithms ... My action space is discrete but my observation is a bit more complexe. For the framework I am more interested in pytorch as it is more research oriented.

My last question for today do you think given the current state of Garage (it is a biased question) is it a good idea that I should use Garage? Will I be in trouble later? ;-)

ryanjulian commented 4 years ago

I think it's a great idea to use garage, as I (and the other maintainers) use it extensively in our own research.

The biggest downside of garage is missing high-level documentation, as you've already found out. The biggest upside is that it's well-tested and makes it easy to experiment with new ideas quickly and by writing just a little code, rather than copy-pasting huge bodies of monolithic algorithm functions.

The only danger you should be aware of for using garage in your own research is that many APIs are still frequently-changing. The fundamentals are very stable, but we are still working on standardizing the interfaces. If you install from master, some days you may update your environment to find we've changed a function signature, moved a package, etc. Generally these changes are small and well-documented by our commit messages, and can be fixed quickly. If you update almost every day, you will likely hardly notice.

If you need a more stable codebase, you can install exclusively from pip. We don't make breaking API changes within a release series, but we do fix bugs in them for about 9 months. That is, if your codebase relies on v2019.10.0, we promise that v2019.10.1 will contain only bug fixes and not breaking API changes (but also, no new features). This allows you to decide when to switch to a new release, such as v2019.04.0, which will have new features, but might also have API changes which force you to go change your code.