PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.53k
stars
832
forks
source link
I converted your implementation to tensorflow but it does not work #264
Hi, I'd really be grateful if you have a look at my implementation following your code as a guide. I tried to experiment with different hyperparameters, rewrite the whole implementation from scratch, follow other ppo implementations in tensorflow, asking in SO / reddit, no matter what I do, it won't work. I re-implemented the whole thing for the 4th time today and I'm really frustrated. If you have the time please check and let me know how it should be fixed. Here's the PPO class which should be sufficient. For the full code here's a colab notebook set to start training on PongNoFrameskip-v4, thanks in advance.
Hi, I'd really be grateful if you have a look at my implementation following your code as a guide. I tried to experiment with different hyperparameters, rewrite the whole implementation from scratch, follow other ppo implementations in tensorflow, asking in SO / reddit, no matter what I do, it won't work. I re-implemented the whole thing for the 4th time today and I'm really frustrated. If you have the time please check and let me know how it should be fixed. Here's the PPO class which should be sufficient. For the full code here's a colab notebook set to start training on
PongNoFrameskip-v4
, thanks in advance.