Off-Policy Algos - ACER, DDPG and SAC

ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

MIT License

3.57k stars 829 forks source link

Off-Policy Algos - ACER, DDPG and SAC #183

Open Riashat opened 5 years ago

Riashat commented 5 years ago

Would you be adding off-policy algorithms like ACER/SAC (which should be compatible for both cont action Mujoco and discrete action ALE tasks) and DDPG/TD3 for control to this repository any time soon?

Would be useful to have all these algos implemented within the same repo. I know this repo is being used as the standard codebase for a lot of papers these days.

ikostrikov commented 5 years ago

I'm planning to add SAC at some point. But there are some difficulties since it might be hard to make everything efficient within a single repo.