starry-sky6688 / MARL-Algorithms

Implementations of IQL, QMIX, VDN, COMA, QTRAN, MAVEN, CommNet, DyMA-CL, and G2ANet on SMAC, the decentralised micromanagement scenario of StarCraft II
1.42k stars 279 forks source link

Question about get action weights #81

Closed LXXXXR closed 2 years ago

LXXXXR commented 2 years ago

Thank you for your work. It's super helpful for beginners like me. Just a question about getting the action weights.

When we generate episodes with the rolloutWorker, we already had the action weights before we choose the actions, https://github.com/starry-sky6688/StarCraft/blob/2c07045f294ad4eeb5ab8a8d25cf43d0efea4cb3/common/rollout.py#L180

but when we calculate the loss in the agent during training, we calculate those action weights again, https://github.com/starry-sky6688/StarCraft/blob/2c07045f294ad4eeb5ab8a8d25cf43d0efea4cb3/policy/reinforce.py#L79

Are there any reasons why we shall do this instead of just throwing those weights into the epidode?

Thank you very much for your time.

starry-sky6688 commented 2 years ago

We need to recalculate action_prob to get its gradient for the update of rnn, the policy network

These weights are just scalars, not tensors. The policy network can not be updated without gradient.

Maybe you can try to save weights as tensors,but I'm not sure it's going to work