rmst / ddpg

TensorFlow implementation of the DDPG algorithm from the paper Continuous Control with Deep Reinforcement Learning (ICLR 2016)
MIT License
209 stars 64 forks source link

Reacher-v1 not training #7

Open amolchanov86 opened 7 years ago

amolchanov86 commented 7 years ago

Hi, I have just tried running Reacher-v1 for 1000000 timesteps with default settings and it didn't learn anything (it just get stuck at -12 test reward), but it looks like you made it running with some settings, what were these settings ?

rmst commented 7 years ago

Hey,

sorry for the late reply! The most important setting which was reward normalization is actually hardcoded into filter_env.py for Reacher-v1. The other hyperparameters etc. should be fine. Have you tried multiple times? Are at least the two pendulum tasks working?

Cheers Simon

amolchanov86 commented 7 years ago

Hi, thanks for the reply !

rmst commented 7 years ago

Hey, sry for the late reply.

I never got Reacher-v1 to "solve" but it was close (like you can see in the gif in the readme). For my evaluations I used the commit before "fixes in replay memory" but actually I don't believe the performance got worse after that commit. I don't use prioritized experience replay. The list of improvements are only a roadmap. I haven't had time to work on that so far and now it actually doesn't seem like such a big improvement compared to other things like auxiliary tasks in a3c and so on. Maybe I will release a new tensorflow deep RL repo though where we can include it.

Ah and no I didn't use it with convolutional nets on pixels yet. But that should also come soon (in the new repo though).

Cheers

amolchanov86 commented 7 years ago

Hi thanks for the help !