tensorforce / tensorforce

Tensorforce: a TensorFlow library for applied reinforcement learning
Apache License 2.0
3.3k stars 530 forks source link

Future plans and DDPG implementation #40

Closed ViktorM closed 7 years ago

ViktorM commented 7 years ago

Hi,

Can you share some plans about roadmap and what algorithms will be added? In particular are there any plans about DDPG implementation with recent improvements: https://arxiv.org/abs/1704.03073 and https://arxiv.org/abs/1707.01495 ?

michaelschaarschmidt commented 7 years ago

Hi,

so with regards to algorithms, we prioritise towards what we see as having the potential to becoming a new 'standard' method. That means we won't add every new paper but rather well established methods or new approaches that seem like natural progress (e.g. parameter space noise/noisy net seems would be a sensible addition even without waiting for a year to see if it's replaced). DDPG could certainly be added, another example would be a hybrid policy gradient/DQN method like PGQ.

The very immediate next tasks are a bit more mundane (see the other issues): fix some small config issues, improve logging and im/export, introduce a generic natural actor critic as a basis for TRPO variants, docker and benchmarking. So in general, we will prioritise making the existing code cleaner, more robust and reliable before adding more features. In particular, we feel that adding more algorithms is much easier if we really focus on getting the modularisation right. Hope that answers your query and of course we would assist you in incorporating DDPG if you urgently need it.

ViktorM commented 7 years ago

Thanks for such a detail reply, Michael!

Yes, I think DDPG can already be called a 'standard' and well established method starting from the benchmark paper. The first paper on dexterity manipulations suggests just a small extention of it and in addition to make it asynchronous. I can start working on its implementation if you accept contributions and can support a bit with following your test and code standards.

And if you can add parameter noise variants of TRPO and other algorithms it will be a great news too!

And one more question - do you have any plans about a PPO implementation as well? It's very similar to TRPO and distributed version of it was used in recent Deepmind parkour locomotion paper: https://arxiv.org/abs/1707.02286 ?

michaelschaarschmidt commented 7 years ago

I agree that DDPG is established enough to warrant addition, we just have not gotten to it yet because there are so many things to do on the general structure (I'd argue refactoring the optimisation package as to make TRPO fit in more naturally should be very high on that list). So if you would want to contribute DPPG that is very welcome as long as it integrates into the modularisation and coding style.

So device execution semantics are a whole separate issue. From our point of view, there are many approaches on asynchronous, thread-parallel, and distributed process execution and there is not much systematic analysis on how to choose amongst them for certain problems, it's often more a 'obviously collecting more data works better and here is how many actors worked best for our problem'. What I personally would want are execution wrappers around the model that implement different approaches of data collection and device execution (Gorila, A3C, GA3C, PAAC, ..) so we can more systematically compare and analyse this, but doing this well is really difficult and will take a lot of time. It's also something I am personally very interested in but as we are doing this on the side of our PhDs it's hard to give timelines.

AlexKuhnle commented 7 years ago

Just wanted to add that Gitter is probably best for support regarding contributions like DDPG, and yes, we're happy to be of help and give guidance.

michaelschaarschmidt commented 7 years ago

Implemented PPO (which you mentioned), think generally this can be closed