Agents requiring mutability for evaluation (action selection)

wschella commented 4 years ago

Hi! First of all, big thanks for this crate.

I had some questions regarding agents (or controllers) that require mutability (&mut self) for selecting an action that is currently not supported. An example class of agents are online solvers or planners in partially observable environments. These agents often learn an approximate policy, or some other heuristic, and use that heuristic to guide a policy search or other planning method relevant only to the current 'state'. However, this current state is usually some function of the entire history of actions and observations up until that point.

Thus updating this current state requires mutating some field in the agent, and thus requires a &mut self. This also implies the agent needs some to be 'reset' after an episode, akin to the current handle_terminal of the OnlineLearner trait. Although a copy could also be sufficient to start from the blank 'initial state'.

A concrete example of this is the infinite POMDP [1] (to which my research is related), but in fact it is relevant to any agent that incorporates data from the current episode to have an effect on planning.

Now I was wondering:

Would you be interested in having a compatible API for those agents in this crate?
If so, how would you see an implementation be incorporated here? (I could make a PR)
- Change the controller trait to take a &mut self and add a handle_terminal method. Conceptually this generalizes the Controller trait, as every struct implementing the current trait could trivially implement the generalization. But this does not seem to be required for the Deep RL agents this crate is focused on (and as such would dirty the interface).
- Add an OnlineController trait with the proposed changes, and implement it for all the controllers (which should be trivial). Conceptually you have an implementation for OnlineController when you have one for Controller. It should be possible to express that using Rust's trait system (playground example).
- Anything else you suggest.

Any feedback would be appreciated. To me it seems useful to support this case (albeit in a type alone), as it would allow writing more agents to the type interface this crate provide. But I understand I might be biased.

Thanks in advance.

PS: I have also wondered if it would be useful to separate some types (mostly the learner, domain and controller ones) into a separate crate, as that would allow implementing an agent against these traits without pulling in all the dependencies for all other agents and domains. But I'll keep proposal that for a separate issue.

[1] Doshi-velez, Finale. ‘The Infinite Partially Observable Markov Decision Process’. In Advances in Neural Information Processing Systems 22, edited by Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, 477–485. Curran Associates, Inc., 2009. http://papers.nips.cc/paper/3780-the-infinite-partially-observable-markov-decision-process.pdf.

tspooner commented 4 years ago

Hey!

First off, sorry for the delay in responding (I haven't been getting alerts about issues). I do really appreciate the input and I'm glad that you're getting some use out of the code.

So I totally agree that there are some real limitations with the interfaces at the moment. I've been meaning to take another look into this to try and cover more broad classes of algorithms. I'll need to spend some time looking into paper you cite to make a proper decision. My only concern is that the crate does not currently support planning. While this would be something worth having in future, I do want to try and avoid making overly complicated traits to make everything integrate right now. My priority is having a set of terse but parsimonious learning abstractions.

Anyway, I'm in the (very slow I realise) process of doing some major changes to the framework which may well change a lot of this anyway. For example, I do think that the Controller trait is actually somewhat ill-defined, and I'm not keen on my choice to have "target" and "behaviour" policies separated here. As part of that I'll prioritise looking into your suggestions. For now, are you OK to fork the crate and implement the traits you need? If you are, please link it here so I can refer to it as I go.

I think the idea of separating some of the core abstractions into a separate crate makes a lot of sense. This would be akin some something like num and num-traits. I will absolutely do this so please do watch out for that as input is always welcome.

Regards, Tom

wschella commented 4 years ago

Thanks for the response, no sweat for the delay!

Designing abstractions is just a very hard problem to tackle given the broad scope of artificial 'intelligence'. The reason this online controller or planner would be 'necessary' is because of Rust's mutability & ownership rules, in other languages it would not even be a problem at all (until ofc you end up with bugs). I've thought about it some more, and mutability during evaluation feels necessary for a very broad class of agents actually. I've mentioned online solvers and planners, but anything that does 'continual learning' would fall under it, which in fact every RL agent should be able to do, depending on the needs of the application. Unfortunately I can't give many pointers to influential papers, as my overview of the field is quite limited.

Also, the nature of defining handler-type methods makes it already quite difficult to have any concept of being called sequentially be reflected in the API and types. Currently, this is the responsibility of the evaluation/training code. So I'm not even sure you can cleanly adapt the traits to the 'online learner/planner' paradigm at all without completely revamping the crate, or defining a crate that is completely disconnected from the rest of the crate

So there's definitely no need to prioritize my use case in any way. It's hard, and I've no clue on how to do it myself in any way that is not a hack. I currently solved my problem by just having a local crate with all the types (since there where some problems with enabling blas here), where I just added a mut to the controller :laughing:.

I will keep an eye on separating out the types!

Thanks for the effort, there's no hurry.

wschella commented 4 years ago

Concretely I feel like the generator pattern would be a flexible and nice abstraction over agents that would fit a very broad scope of use cases while maintaining a clean interface.

Since generators are an unstable feature (and will likely be so for a while), I'll experiment a bit with a hand rolled implementation in somewhere in the (not super near) future. I'll try to report progress here, which should by no means be interpreted as request to consider my conclusions for this crate.

Frankly I just get exited about implementing environments and agents as generators and then implementing executors, experiments & evaluation over them.

tspooner commented 4 years ago

Hey,

I agree completely. I think generators would be a great thing to support. My only concern is that I don't want to rely on unstable features in the core crate. I've been desperate for some of the features like impl specialisation and GATs for a long time, but held out in favor of stable Rust.

However, I have a proposal. I have been re-writing a lot of the underlying code in rsrl this weekend and will push something soon. This replaces many of the wet traits with a single trait called Handler. This traits looks like the following:

pub trait Handler<M> {
    type Response;
    type Error;

    fn handle(&mut self, message: M) -> Result<Self::Response, Self::Error>;
}

I think it makes a lot of sense to design the framework to work in an event-driven manner. Furthermore, I'm super keen to integrate first-class support for futures once async traits drop on stable. This will make it much easier to integrate later down the line.

Now, taking all of this into consideration, here's the proposal: split the crate into a larger family as in the num ecosystem. I'm not sure what would be the best way yet, but I think your suggestion to have a traits crate is the first step. The next would be to have a crate for distributed experiments etc (as with the ray framework in Python). This would be the perfect place for code that you're proposing!

So, please do keep me updated as I will create this crate at some point soon and the more features and syntactic sugar for running experiments the better.

Tom

wschella commented 4 years ago

A generalized handler pattern like this actually comes quite close to a 'handrolled' generator, and I think in almost all scenario's is preferable over the raw Generator trait, which as you suggest, is usually just relevant as a syntactic sugar.

Given that this is the current unstable Generator trait

pub trait Generator<R = ()> {
    type Yield;
    type Return;
    fn resume(self: Pin<&mut Self>, resume: R) -> GeneratorState<Self::Yield, Self::Return>;
}

It'd say what you're currently doing looks perfect (and a better fit for AI use cases), and I'm excited to check it out :)

tspooner commented 4 years ago

Hey @wschella. Not sure if you've seen the latest version of rsrl. Do let me know whether this fixes your issues and we can close this. If not it would be good to hear what you think so I can try to address things in future versions.

wschella commented 4 years ago

Hi, thanks for the changes! I haven't had the time to actually do some implementations, but as far as I can see the handler trait should make it possible for me to whip something up. So if it's up to me this issue can be closed.

Kind regards and much thanks

tspooner commented 4 years ago

Ok excellent!

tspooner / rsrl

Agents requiring mutability for evaluation (action selection) #73