sachindharashivkar / Doom-Agent

A3C Agent to perform various tasks in ViZDoom environment.
Apache License 2.0
1 stars 0 forks source link

About the function reinforce() #2

Open Tord-Zhang opened 7 years ago

Tord-Zhang commented 7 years ago

First of all I would like to show my appreciation for your work.I just read your code of A3C.I found your implement is quite different from others.Mainly because of the two lines code blow: net2:reinforce(rew) net2:backward(states[j]) Could you please explain how this works?

sachindharashivkar commented 7 years ago

Hi Mangdian,

In net2:reinforce(rew), we are passing calculated advantage to the module. reinforce() method stores the passed value as reward. And then I am using backward() method, to calculate and accumulate gradients according to the specific input state. It's not required to pass gradients to the backward() method in this case.

There is one error in this code. It is back propagating gradients through softmax rather than logsoftmax. I will clean the code and change this part. I really appreciate you reading through my spaghetti code and apologies for the delayed reply. Hopefully my explanation was helpful. Please let me know if you have any other doubts.

Thanks, Sachin