ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.56k stars 831 forks source link

Adding Prioritized Experience Replay #5

Closed AjayTalati closed 6 years ago

AjayTalati commented 6 years ago

Hi, I was wondering if a Prioritized Experience Replay buffer could be added to PPO?

They do something similar to that here - Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards, with DDPG.

I'm guessing though PPO would be more stable?

Perhaps OpenAI's prioritized replay_buffer, from the baselines repo could be used?

ikostrikov commented 6 years ago

For PPO, it will probably have a much smaller effect since the replay buffer is much smaller.