Open Tord-Zhang opened 7 years ago
Hi Mangdian,
In net2:reinforce(rew), we are passing calculated advantage to the module. reinforce() method stores the passed value as reward. And then I am using backward() method, to calculate and accumulate gradients according to the specific input state. It's not required to pass gradients to the backward() method in this case.
There is one error in this code. It is back propagating gradients through softmax rather than logsoftmax. I will clean the code and change this part. I really appreciate you reading through my spaghetti code and apologies for the delayed reply. Hopefully my explanation was helpful. Please let me know if you have any other doubts.
Thanks, Sachin
First of all I would like to show my appreciation for your work.I just read your code of A3C.I found your implement is quite different from others.Mainly because of the two lines code blow:
net2:reinforce(rew) net2:backward(states[j])
Could you please explain how this works?