Need help to understand how grad-inv accelerate learning process

stevenpjg / ddpg-aigym

Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments

MIT License

272 stars 74 forks source link

The gradient inverter is actually meant for having bounds on the parameter space. Gradients are downscaled as the parameters approaches the bound and are inverted if the parameters exceeds the value range. My interpretation on acceleration in learning speed: think of the grad inverter as a threshold to the gradients. This reduces gradients noise. Also, empirically I found an increase in learning speed when I used grad inverter. For more details, you can refer the paper link

I used NVIDIA GTX 960M to train, it took me around 10 hours to generate the learning curve in https://github.com/stevenpjg/ddpg-aigym/blob/master/learning_curve.png

Yes, it works for Reacher-v1 now but takes a while. You can accelerate it using a small wrapper to use normalized environments and reward scaling as mentioned link

stevenpjg / ddpg-aigym

Need help to understand how grad-inv accelerate learning process #4