Closed ShaniGam closed 6 years ago
Yeah. I also failed to make GRU policy stable in all environments. OpenAI baselines implementation has exactly the same problem. I'm not sure whether a different value for num-stack will change a lot. I guess that it requires a larger hyper-parameters search.
For a recurrent policy only PPO works well.
That's very disappointing (not you're fault of course). I wish that your A3C implementation worked with newer PyTrorch versions than 0.1.12 on UBUNTU :/
In that regard, why is a recurrent policy currently not supported for ACKTR?
@timmeinhardt it would take some extra work, since KFAC approximation is different for every type of layers.
For RNNs it requires something like this: https://openreview.net/forum?id=HyMTkQZAb
I will have time to add it myself probably only in the late September.
@ShaniGam there was a bug in recurrent policy. Try again.
@ikostrikov Still not working :/
Worked for me by increasing num_steps to 20. Still needs more hyper-parameter tuning as its super unstable (probably the lr needs to come down).
First, great work!! I've decided to "upgrade" and use your A2C implementation instead of your A3C's, but I was surprised to see in your code that the changes aren't minor as I thought they would be. For example, your default is the FF version and not LSTM (or GRU in this version). I tried running the code on Breakout with --recurrent-policy and the model didn't seem to learn anything (works without it though). Are there an other parameters that I should change if I want to use it? e.g, maybe num-stack should be changed to 1?