rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.87k stars 310 forks source link

Custom env implementation #1407

Closed sereysethy closed 4 years ago

sereysethy commented 4 years ago

Hello,

I am implementing a custom environment that needs to interact with other services (eg. through a socket, query a database, etc.), 1) my first question how to implement such environment without having issue of serialisation when running experimentation using local runner. 2) And supposed the training is finished, how can I integrate the result in the application for testing (something like a production)?

At this point I am more interested in my first question, what is the design implementation that I should consider? Or how to start the training without using local runner? Can it be called programmatically?

Best, Sethy

avnishn commented 4 years ago

Hi @sereysethy, thanks for using Garage!

1) assuming that your module has implemented the gym.envs interface and the services that your environment uses (the socket and database querying modules) are pickleable, then you should be able to wrap your custom environment using garage.envs.GarageEnv. 2) take a look at the script examples/sim_policy.py. If you use snapshotting to save your experiments(this is enabled by default and the resulting snapshot in a directory called data/ which is created wherever you run your experiment from) then you should be able to reload your pickled policy and integrate it into your custom application for testing. The pickled policy will continue to implement the functions under garage.np.Policy, which can be found here: https://github.com/rlworkgroup/garage/blob/master/src/garage/np/policies/policy.py

Lastly, we strongly recommend using local runner for running experiments, as it handles tasks such as logging and snapshotting of results, and allocating resources for components such as samplers. That being said, all of the components needed to run an algorithm (samplers, policies, q functions, replay buffers, etc) don't actually need the local runner in order for them to be run, so in theory you could use these components in an algorithm and runner that you have written. However in order to use any of the algorithms in garage, you will need to use the local runner.

I hope this answers your questions, Avnish

sereysethy commented 4 years ago

Hi Avnish,

I think this might not be related to garage but it is more like implementation detail. socket is not pickeable, only pure object can be pickeable. At this point I am stuck, I do not know how to make a connection between env object with external services. The service that I call is async, it is used callback when result is received.

If I can illustrate my idea for my env, it is something like this:

  1. initiate a client connection through a socket to a server
  2. inject that connection as an attribute of my env
  3. start experiment
  4. when taking a step the env, send a request to the server through client connection
  5. upon reply is received, env object is called back with the reply and only at this point that the step can return a new observation
  6. go to step 4

Best, Sethy

avnishn commented 4 years ago

@sereysethy

The first thing you should know is that environments are pickled for 2 reasons in garage: 1) they are pickled by the snapshotter so that training can be later resumed if it is interrupted 2) the sampler being used by your algorithm requires pickling e.g. the Ray Sampler and MultiProcessing Sampler.

Pickling relies on the usage of set_state and get_state methods. get_state is called during the pickling operation, and set_state is called during the unpickling operation. you could pass the parameters of the socket to your environment as a dict and save them as attributes. Then when set_state is called during the unpickling process, you can construct a new socket with the socket parameters that were pickled, and resume operations.

Also generally, RL algorithms are bound in their runtime performance by the runtime of the sampling process. Accessing a socket during sampling sounds to me like an incredibly slow process, and so you might want to reduce the number of socket receive calls that you are making, by prefetching the data that you need before hand.

sereysethy commented 4 years ago

Thank you for your quick reply.

When you talk about sampling, the sampling is done based on the current training algorithm? I meant if it is a policy based, the action will be taking based on a current policy? Or if it a q-learning, the action is chosen by the q-function? I read the code, I see each algorithm has a sample_cls attached to its base algorithm. So it is already defined?

By the way the suggestion of prefetching data is ok if you have data before hand, but in my case I do not have that data, because data has to be executed during the action is taken on the env. I could run all the actions on different observation and record new observations, rewards, etc... but it will not capture the real dynamics of the env. I supposed that in real life problem where you cannot simulate it, how can sampler be used in that case?

In the runner, what is batch_size, it is a parameter to sampler?

ryanjulian commented 4 years ago

I implore you to read the code for answers to these questions. There's no substitute for reading code.

It is the algorithm's responsibility to provide a policy which will be used for sampling. Currently, this is retrieved by LocalRunner with the algo.policy property.

In the case of executing an environment which is tied to the real world, sampling works exactly the same, but it will run much slower. Some speedups, such as vectorization or running multiple sampling workers, may make no sense or be unavailable. Please read the code in the garage.sampler package and additionally LocalRunner.

batch_size is universally defined as the number of environment steps collected for each iteration of optimization.

sereysethy commented 4 years ago

I did read the code, and I am still doing it. It helps me a lot to understand Garage, I asked questions just to confirm my understanding. Sorry for some naive questions.

In fact I wanted to go fast, and I did not want to hack the library to fulfil my needs, that is why I wanted to understand it. After a lot of reflection, I think the way garage is built is not suitable for a custom complex environment, at least for a real world problem that cannot be simulated, to me if I want to do what I want, I will need to make a lot of changes, and I cannot use LocalRunner to do my training, because my env. depends on an external entity or an external program that runs in its own process/thread (application client/server). If I want to use Garage, my option is to not use LocalRunner, and I will have to call train_once by myself. I see how sampling is done but the problem is my env which is difficult, and I do not know how it can be pickeable. Garage algo is too tied up to runner.

I have read the document of Ray/RLlib they have something called ExternalEnv that allows the training to be independent from the env, which they can communicate. I feel the RLLib is more flexible.

avnishn commented 4 years ago

Hi @sereysethy,

As I mentioned before, I think that this is the solution to your problem:

Pickling relies on the usage of set_state and get_state methods. get_state is called during the pickling operation, and set_state is called during the unpickling operation. you could pass the parameters of the socket to your environment as a dict and save them as attributes. Then when set_state is called during the unpickling process, you can construct a new socket with the socket parameters that were pickled, and resume operations.

The idea here is that since your socket can't be pickled, you can instead pass the parameters to the socket that you need to your custom environment, and then when it is unpickled, you can make it so that your environment creates a new socket with the parameters it needs.

I understand that it is frustrating that Garage doesn't easily support your use case with pre-baked logic. This may even be something that we support in the future if enough users have a necessity for this feature, but in the present we don't have any plans to add support for this feature. As a group of researchers, we support you in your endeavor to see your project/idea to fruition, and if Rllib helps you to achieve this, then by all means please use it.

However, it should be known that in the present, because Reinforcement learning is in a stage of research, no research framework will be able to encompass all the possible use cases that users have. This is where the beauty of Garage comes in: By making a relatively lightweight framework that doesn't constrain users heavily, users are able to use garage primitives along with the local runner, AND their own custom ideas (such as your environment that uses sockets for querying information from some external source).

Since I think that me and ryan have answered your garage related questions to the best of our ability, I'm going to close this issue. If you need an easy way to look at our api documentation so that its easier to answer questions such as "What is the batch size?" you could always look at the code, specifically the docstrings of functions, or you can consult this : https://garage.readthedocs.io/en/v2020.04rc1/py-modindex.html

Anyways, happy hunting, best of luck with your research, Avnish

sereysethy commented 4 years ago

Hi Avnish and Ryan,

I really appreciate your answers. I myself is also a researcher and I am willing to contribute to the development of Garage. I am very much interested in RL, it is also part of my research interest. I see the beauty of Garage but as I mentioned at this stage Garage is too tied up to LocalRunner and it limits its usage with a custom environment, which cannot be simulated in some cases. In my use case, my system has to communicate with another system using Twisted client when the agent takes action, putting Twisted in Garage is impossible. And creating a socket each time when taking an action is extremely expensive, and another problem is that Twisted is event driven, so it uses callback when data is available. My idea for Garage inspired from Ray/RLLib, if we can decouple the way sampling works, by creating a server that interfaces with env through client/server architecture, the env is still passed to local runner but as a stub. Garage can be used to learn and also serve its policy through API. This will make it more appealing to other people who want to use Garage in their eco system. I know Garage has its own research purpose, and I do not know what is your next roadmap.

Best, Sethy

ryanjulian commented 4 years ago

Twisted in Garage is impossible.

There's no reason garage and twisted can't coexist in the same Python environment. twisted could be used to implement any of the APIs in garage (ReplayBuffer, Policy, Environment, etc.).

If you are referring to the fact the garage APIs are synchronous, well this is intentional. MDPs are synchronous, so the MDP API (gym.Env) is synchronous. Sampling an action from a policy is synchronous, so the Policy API is synchronous.

The two parts of an RL training process which can be asynchronous are sampling and optimization. Off-policy RL algorithms can use asynchronous sampling, but in practice most do not unless they are used for very-large scale projects. This would not be very hard to add to garage, and is something we'll likely do in the future to simplify sampling for off-policy algorithms.

Asynchronous optimization requires special classes of RL algorithms (e.g. IMPALA), and would require more extensive modification and a more general Runner API. Supporting fully-asynchronous algorithms is probably much farther away (a year or more).

RLLib is an impressive feat of engineering, and it takes a very different design approach than garage. In general, when a new feature arises, RLlib adds API complexity to support that feature. RLLib has tons of features, but also an enormous API. In general, when new feature arises, garage seeks to add as little complexity as possible to support that feature. Usually, this means making existing APIs a little more general, but striving to keep the simple case easy-to-use and understand.

As a result, in order to use RLLib for a simple project, you have to implement, or at least understand, your project as a completely-asynchronous RL training process, which might involve thousands of individual workers, sharded replay buffers, and multi-GPU asynchronous trainers. This requires many layers of abstraction, and often advanced design patterns which obfuscate the programmer's intention in order to allow for greater generality.

Developing code using RLLib is slower, but once it is running it can quickly scale to very large problems. Developing code using garage is faster, but scaling it will take additional work. Those who are willing to develop slower but need to scale faster should absolutely use RLLib. Those who are interested in rapid development should use garage. One day, garage may have enough features to scale as quickly as RLLib (nothing in its design precludes this possibility), but thus far we are focusing on the rapid development over rapid scaling.

My idea for Garage inspired from Ray/RLLib, if we can decouple the way sampling works, by creating a server that interfaces with env through client/server architecture, the env is still passed to local runner but as a stub.

You can do this today, by writing an environment implementation whose Env.step() and Env.reset() methods are actually backed by blocking calls to whatever RPC you want.

at this stage Garage is too tied up to LocalRunner and it limits its usage with a custom environment,

Virtually all parts of garage in interchangeable, including LocalRunner. If you'd like, you can implement your own Runner class with different semantics, and pass it to the algorithms instead.

limits its usage with a custom environment, which cannot be simulated in some cases.

LocalRunner makes absolutely no assumption that your environment is simulated. Calls to Env.step() could be backed by literally anything -- a socket, an HTTP request, a real robot, a physics simulation, etc. Calling Env.step() could place a phone call to a human and ask them a question, then return the audio as an observation. The only assumption Env.step() makes is that it receives an action and returns an observation, reward, and terminal condition.

sereysethy commented 4 years ago

Hi Ryan,

Thank you for always taking time to reply to my questions.

In fact my application is in Twisted, and what I tried before was to implement Garage and my custom env, but the problem that I faced is env. serialisation when I passed a protocol transport to the env (which is a socket) and launched experiment.

My understanding of Garage is still limited which in turn does not allow me to use it properly. I want to create a toy project in Github and share it with you, in which I will use a client in twisted and combine it with Garage. Will you be able to help me?

ryanjulian commented 4 years ago

@sereysethy feel free to create a toy project as a demonstration and we'll do our best to provide comments.

@avnishn provides a great overview above of what you need to make your project possible. Essentially, you just have to describe to Python how to recreate the socket connections once the environment has been deserialized. In your case, you can't pickle a socket, so you'll have to pickle the information needed to recreate the socket (e.g. the server URI) and then use it to recreate the socket for your object during unpickling. There are many examples of custom pickling in garage (search for __setstate__ and __getstate__). You can consult the Python documentation or https://rszalski.github.io/magicmethods/#pickling for guides on implementing a custom serializer.

sereysethy commented 4 years ago

@ryanjulian @avnishn I invited you to my private github. I created a toy project, so far it is purely twisted app. But the idea is how to use twisted client as an env.