Fixes for issues #3, #5 and #7. Agent learns better

praveen-palanisamy commented 6 years ago

Below is the summary of the contributions made by this PR:

[x] Fixes for issues #3, #5 and #7
[x] Overcomes the limitation of #4. This PR enables the agent to collect more rewards with minimal changes to the original scripts.
[x] Tested with Pytorch 0.2 (as requested in #5 )

SSARCandy commented 6 years ago

thanks for your contribution! It works on pytorch 0.2 👍

BTW, can I ask why doing gradient clipping, is that matters to performance? thanks for your code again :)

praveen-palanisamy commented 6 years ago

Glad to hear that my contributions helped you.

Clipping the gradient will make sure that the gradients don't "explode" which is a common problem encountered when using gradient descent algorithms with neural networks. In this case with DQN, gradient clipping will ensure that the optimization algorithm only takes small (in magnitude) steps in the direction pointed to by the gradient. Making a larger descent step and hence a big update to the Q-value function approximation might throw the approximation off from (converging to) the optimal values.

Hope the explanation helps.

hungtuchen / pytorch-dqn

Fixes for issues #3, #5 and #7. Agent learns better #8