stevenpjg / ddpg-aigym

Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments
MIT License
272 stars 74 forks source link

Run the codes in the "Reacher" task #11

Closed cardwing closed 7 years ago

cardwing commented 7 years ago

Hi, steven! Recently, I have downloaded your codes and test it on the "Reacher" task. However, I found that with GPU-based tensorflow, it could run 200 episodes per day. It seems a bit slow. Is there anything I need to adjust to fasten the process?(I found that the usage of GPU is low, around 3%~10%, maybe the GPU is not used sufficiently) Plus, you said that we could use one more wrapper to scale the reward, can you explain it more specifically? Thanks a lot!

stevenpjg commented 7 years ago

When I trained on inverted pendulum, it took me 12 hours to generate 400 episodes of the learning curve (with batch normalization) and 12 hours to generate 1000 episodes (without batch normalization) shown in link .

Regarding the Slow GPU problem: Since the GPU usage is just 3-10%, I assusme tensorflow is not using GPU properly. Did you check if the tensorflow is using GPU? because the recent version of GPU Tensorflow package automatically uses CPU if CUDA and CUDANN are not properly integrated. I had similar problem recently. You can use this code to check GPU integration: sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) The slowness is also attributed to batch gradient update at each time steps. Also I noted the update step was fast during the first few hours of training. Contributions are welcome if you have a faster implementations.

As explained in issue #5 , wrapper I meant, wrapper for normalizing values of gym environments i.e) normalizing the values of states, actions, rewards in the range 0-1.