About the function reinforce()

Hi Mangdian,

In net2:reinforce(rew), we are passing calculated advantage to the module. reinforce() method stores the passed value as reward. And then I am using backward() method, to calculate and accumulate gradients according to the specific input state. It's not required to pass gradients to the backward() method in this case.

There is one error in this code. It is back propagating gradients through softmax rather than logsoftmax. I will clean the code and change this part. I really appreciate you reading through my spaghetti code and apologies for the delayed reply. Hopefully my explanation was helpful. Please let me know if you have any other doubts.

Thanks, Sachin

sachindharashivkar / Doom-Agent

About the function reinforce() #2