Run the codes in the "Reacher" task

When I trained on inverted pendulum, it took me 12 hours to generate 400 episodes of the learning curve (with batch normalization) and 12 hours to generate 1000 episodes (without batch normalization) shown in link .

Regarding the Slow GPU problem: Since the GPU usage is just 3-10%, I assusme tensorflow is not using GPU properly. Did you check if the tensorflow is using GPU? because the recent version of GPU Tensorflow package automatically uses CPU if CUDA and CUDANN are not properly integrated. I had similar problem recently. You can use this code to check GPU integration: sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) The slowness is also attributed to batch gradient update at each time steps. Also I noted the update step was fast during the first few hours of training. Contributions are welcome if you have a faster implementations.

As explained in issue #5 , wrapper I meant, wrapper for normalizing values of gym environments i.e) normalizing the values of states, actions, rewards in the range 0-1.

stevenpjg / ddpg-aigym

Run the codes in the "Reacher" task #11