Closed wschella closed 4 years ago
Hey!
First off, sorry for the delay in responding (I haven't been getting alerts about issues). I do really appreciate the input and I'm glad that you're getting some use out of the code.
So I totally agree that there are some real limitations with the interfaces at the moment. I've been meaning to take another look into this to try and cover more broad classes of algorithms. I'll need to spend some time looking into paper you cite to make a proper decision. My only concern is that the crate does not currently support planning. While this would be something worth having in future, I do want to try and avoid making overly complicated traits to make everything integrate right now. My priority is having a set of terse but parsimonious learning abstractions.
Anyway, I'm in the (very slow I realise) process of doing some major changes to the framework which may well change a lot of this anyway. For example, I do think that the Controller
trait is actually somewhat ill-defined, and I'm not keen on my choice to have "target" and "behaviour" policies separated here. As part of that I'll prioritise looking into your suggestions. For now, are you OK to fork the crate and implement the traits you need? If you are, please link it here so I can refer to it as I go.
I think the idea of separating some of the core abstractions into a separate crate makes a lot of sense. This would be akin some something like num
and num-traits
. I will absolutely do this so please do watch out for that as input is always welcome.
Regards, Tom
Thanks for the response, no sweat for the delay!
Designing abstractions is just a very hard problem to tackle given the broad scope of artificial 'intelligence'. The reason this online controller or planner would be 'necessary' is because of Rust's mutability & ownership rules, in other languages it would not even be a problem at all (until ofc you end up with bugs). I've thought about it some more, and mutability during evaluation feels necessary for a very broad class of agents actually. I've mentioned online solvers and planners, but anything that does 'continual learning' would fall under it, which in fact every RL agent should be able to do, depending on the needs of the application. Unfortunately I can't give many pointers to influential papers, as my overview of the field is quite limited.
Also, the nature of defining handler-type methods makes it already quite difficult to have any concept of being called sequentially be reflected in the API and types. Currently, this is the responsibility of the evaluation/training code. So I'm not even sure you can cleanly adapt the traits to the 'online learner/planner' paradigm at all without completely revamping the crate, or defining a crate that is completely disconnected from the rest of the crate
So there's definitely no need to prioritize my use case in any way. It's hard, and I've no clue on how to do it myself in any way that is not a hack. I currently solved my problem by just having a local crate with all the types (since there where some problems with enabling blas here), where I just added a mut
to the controller :laughing:.
I will keep an eye on separating out the types!
Thanks for the effort, there's no hurry.
Concretely I feel like the generator pattern would be a flexible and nice abstraction over agents that would fit a very broad scope of use cases while maintaining a clean interface.
Since generators are an unstable feature (and will likely be so for a while), I'll experiment a bit with a hand rolled implementation in somewhere in the (not super near) future. I'll try to report progress here, which should by no means be interpreted as request to consider my conclusions for this crate.
Frankly I just get exited about implementing environments and agents as generators and then implementing executors, experiments & evaluation over them.
Hey,
I agree completely. I think generators would be a great thing to support. My only concern is that I don't want to rely on unstable features in the core crate. I've been desperate for some of the features like impl specialisation and GATs for a long time, but held out in favor of stable Rust.
However, I have a proposal. I have been re-writing a lot of the underlying code in rsrl
this weekend and will push something soon. This replaces many of the wet traits with a single trait called Handler
. This traits looks like the following:
pub trait Handler<M> {
type Response;
type Error;
fn handle(&mut self, message: M) -> Result<Self::Response, Self::Error>;
}
I think it makes a lot of sense to design the framework to work in an event-driven manner. Furthermore, I'm super keen to integrate first-class support for futures once async traits drop on stable. This will make it much easier to integrate later down the line.
Now, taking all of this into consideration, here's the proposal: split the crate into a larger family as in the num
ecosystem. I'm not sure what would be the best way yet, but I think your suggestion to have a traits crate is the first step. The next would be to have a crate for distributed experiments etc (as with the ray framework in Python). This would be the perfect place for code that you're proposing!
So, please do keep me updated as I will create this crate at some point soon and the more features and syntactic sugar for running experiments the better.
Tom
A generalized handler pattern like this actually comes quite close to a 'handrolled' generator, and I think in almost all scenario's is preferable over the raw Generator trait, which as you suggest, is usually just relevant as a syntactic sugar.
Given that this is the current unstable Generator trait
pub trait Generator<R = ()> {
type Yield;
type Return;
fn resume(self: Pin<&mut Self>, resume: R) -> GeneratorState<Self::Yield, Self::Return>;
}
It'd say what you're currently doing looks perfect (and a better fit for AI use cases), and I'm excited to check it out :)
Hey @wschella. Not sure if you've seen the latest version of rsrl
. Do let me know whether this fixes your issues and we can close this. If not it would be good to hear what you think so I can try to address things in future versions.
Hi, thanks for the changes! I haven't had the time to actually do some implementations, but as far as I can see the handler trait should make it possible for me to whip something up. So if it's up to me this issue can be closed.
Kind regards and much thanks
Ok excellent!
Hi! First of all, big thanks for this crate.
I had some questions regarding agents (or controllers) that require mutability (
&mut self
) for selecting an action that is currently not supported. An example class of agents are online solvers or planners in partially observable environments. These agents often learn an approximate policy, or some other heuristic, and use that heuristic to guide a policy search or other planning method relevant only to the current 'state'. However, this current state is usually some function of the entire history of actions and observations up until that point.Thus updating this current state requires mutating some field in the agent, and thus requires a
&mut self
. This also implies the agent needs some to be 'reset' after an episode, akin to the currenthandle_terminal
of the OnlineLearner trait. Although a copy could also be sufficient to start from the blank 'initial state'.A concrete example of this is the infinite POMDP [1] (to which my research is related), but in fact it is relevant to any agent that incorporates data from the current episode to have an effect on planning.
Now I was wondering:
&mut self
and add ahandle_terminal
method. Conceptually this generalizes the Controller trait, as every struct implementing the current trait could trivially implement the generalization. But this does not seem to be required for the Deep RL agents this crate is focused on (and as such would dirty the interface).Any feedback would be appreciated. To me it seems useful to support this case (albeit in a type alone), as it would allow writing more agents to the type interface this crate provide. But I understand I might be biased.
Thanks in advance.
PS: I have also wondered if it would be useful to separate some types (mostly the learner, domain and controller ones) into a separate crate, as that would allow implementing an agent against these traits without pulling in all the dependencies for all other agents and domains. But I'll keep proposal that for a separate issue.
[1] Doshi-velez, Finale. ‘The Infinite Partially Observable Markov Decision Process’. In Advances in Neural Information Processing Systems 22, edited by Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, 477–485. Curran Associates, Inc., 2009. http://papers.nips.cc/paper/3780-the-infinite-partially-observable-markov-decision-process.pdf.