Closed AloshkaD closed 5 years ago
Hello, PPO is meant to be on-policy (the policy that generates samples needs to be the same that is optimized (and not an older version)) so I don t think it really makes sense to have an experience replay in that case.
My bad, I meant to say dueling dqn. I was coding a ppo2 and was stuck in my head.
Well then, i don t really understand your question neither. Prioritized experience replay is already implemented for ddqn in stable baselines
I see, I missed that in the code. I'll give it a second look. Thanks!
I need to test ppo2 with a prioritized experience replay and I wonder if anyone wrote a similar integration before I go ahead and write it from scratch.