ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.57k stars 829 forks source link

GRU doesn't work for A2C #89

Closed ShaniGam closed 6 years ago

ShaniGam commented 6 years ago

First, great work!! I've decided to "upgrade" and use your A2C implementation instead of your A3C's, but I was surprised to see in your code that the changes aren't minor as I thought they would be. For example, your default is the FF version and not LSTM (or GRU in this version). I tried running the code on Breakout with --recurrent-policy and the model didn't seem to learn anything (works without it though). Are there an other parameters that I should change if I want to use it? e.g, maybe num-stack should be changed to 1?

ikostrikov commented 6 years ago

Yeah. I also failed to make GRU policy stable in all environments. OpenAI baselines implementation has exactly the same problem. I'm not sure whether a different value for num-stack will change a lot. I guess that it requires a larger hyper-parameters search.

For a recurrent policy only PPO works well.

ShaniGam commented 6 years ago

That's very disappointing (not you're fault of course). I wish that your A3C implementation worked with newer PyTrorch versions than 0.1.12 on UBUNTU :/

timmeinhardt commented 6 years ago

In that regard, why is a recurrent policy currently not supported for ACKTR?

ikostrikov commented 6 years ago

@timmeinhardt it would take some extra work, since KFAC approximation is different for every type of layers.

For RNNs it requires something like this: https://openreview.net/forum?id=HyMTkQZAb

I will have time to add it myself probably only in the late September.

ikostrikov commented 6 years ago

@ShaniGam there was a bug in recurrent policy. Try again.

ShaniGam commented 6 years ago

@ikostrikov Still not working :/

erikwijmans commented 6 years ago

Worked for me by increasing num_steps to 20. Still needs more hyper-parameter tuning as its super unstable (probably the lr needs to come down). visdom_image-2