stevenpjg / ddpg-aigym

Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments
MIT License
275 stars 74 forks source link

Need help to understand how grad-inv accelerate learning process #4

Closed sarvghotra closed 8 years ago

sarvghotra commented 8 years ago

I hope I am not troubling you too much by asking questions.

Could you please help me to understand the notion of the recent changes made to accelerate learning ? BTW is it converging on Reacher-v1 ? Could you please also mention the time taken to learn and your system configuration ? Also, look at this paper for reward scaling, it could be a reason for divergence just in case it is not converging.

stevenpjg commented 8 years ago

The gradient inverter is actually meant for having bounds on the parameter space. Gradients are downscaled as the parameters approaches the bound and are inverted if the parameters exceeds the value range. My interpretation on acceleration in learning speed: think of the grad inverter as a threshold to the gradients. This reduces gradients noise. Also, empirically I found an increase in learning speed when I used grad inverter. For more details, you can refer the paper link

I used NVIDIA GTX 960M to train, it took me around 10 hours to generate the learning curve in https://github.com/stevenpjg/ddpg-aigym/blob/master/learning_curve.png

Yes, it works for Reacher-v1 now but takes a while. You can accelerate it using a small wrapper to use normalized environments and reward scaling as mentioned link