PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
Hi, I was wondering if a Prioritized Experience Replay buffer could be added to PPO?
They do something similar to that here - Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards, with DDPG.
I'm guessing though PPO would be more stable?
Perhaps OpenAI's prioritized replay_buffer, from the baselines repo could be used?