Open amolchanov86 opened 7 years ago
Hey,
sorry for the late reply! The most important setting which was reward normalization is actually hardcoded into filter_env.py for Reacher-v1. The other hyperparameters etc. should be fine. Have you tried multiple times? Are at least the two pendulum tasks working?
Cheers Simon
Hi, thanks for the reply !
Hey, sry for the late reply.
I never got Reacher-v1 to "solve" but it was close (like you can see in the gif in the readme). For my evaluations I used the commit before "fixes in replay memory" but actually I don't believe the performance got worse after that commit. I don't use prioritized experience replay. The list of improvements are only a roadmap. I haven't had time to work on that so far and now it actually doesn't seem like such a big improvement compared to other things like auxiliary tasks in a3c and so on. Maybe I will release a new tensorflow deep RL repo though where we can include it.
Ah and no I didn't use it with convolutional nets on pixels yet. But that should also come soon (in the new repo though).
Cheers
Hi thanks for the help !
Hi, I have just tried running Reacher-v1 for 1000000 timesteps with default settings and it didn't learn anything (it just get stuck at -12 test reward), but it looks like you made it running with some settings, what were these settings ?