Closed sarvghotra closed 8 years ago
The gradient inverter is actually meant for having bounds on the parameter space. Gradients are downscaled as the parameters approaches the bound and are inverted if the parameters exceeds the value range. My interpretation on acceleration in learning speed: think of the grad inverter as a threshold to the gradients. This reduces gradients noise. Also, empirically I found an increase in learning speed when I used grad inverter. For more details, you can refer the paper link
I used NVIDIA GTX 960M to train, it took me around 10 hours to generate the learning curve in https://github.com/stevenpjg/ddpg-aigym/blob/master/learning_curve.png
Yes, it works for Reacher-v1 now but takes a while. You can accelerate it using a small wrapper to use normalized environments and reward scaling as mentioned link
I hope I am not troubling you too much by asking questions.
Could you please help me to understand the notion of the recent changes made to accelerate learning ? BTW is it converging on Reacher-v1 ? Could you please also mention the time taken to learn and your system configuration ? Also, look at this paper for reward scaling, it could be a reason for divergence just in case it is not converging.