Open choasLC opened 2 years ago
Hello, I am not the contributor of this program, but maybe I can help you on this problem. The actor network and Q network is updated together by calling the function loss = agent.update(trainers, train_step)
, in train.py
. It is very convenient to use since the authors have done this for us and what we need to do is to simply call the function.
I don't really follow the way how you update the actor. From my understanding, the chain rule is required for the gradient of the parameters in the actor, right? But, I did not see that in your train model. I may be wrong, but if you could give me a hint, that will be wonderful!!