Closed LXXXXR closed 2 years ago
We need to recalculate action_prob to get its gradient for the update of rnn, the policy network
These weights are just scalars, not tensors. The policy network can not be updated without gradient.
Maybe you can try to save weights as tensors,but I'm not sure it's going to work
Thank you for your work. It's super helpful for beginners like me. Just a question about getting the action weights.
When we generate episodes with the
rolloutWorker
, we already had the action weights before we choose the actions, https://github.com/starry-sky6688/StarCraft/blob/2c07045f294ad4eeb5ab8a8d25cf43d0efea4cb3/common/rollout.py#L180but when we calculate the loss in the
agent
during training, we calculate those action weights again, https://github.com/starry-sky6688/StarCraft/blob/2c07045f294ad4eeb5ab8a8d25cf43d0efea4cb3/policy/reinforce.py#L79Are there any reasons why we shall do this instead of just throwing those weights into the
epidode
?Thank you very much for your time.