starry-sky6688 / MARL-Algorithms

Implementations of IQL, QMIX, VDN, COMA, QTRAN, MAVEN, CommNet, DyMA-CL, and G2ANet on SMAC, the decentralised micromanagement scenario of StarCraft II
1.47k stars 283 forks source link

A bug when choosing actions #94

Closed Chty-syq closed 2 years ago

Chty-syq commented 2 years ago

In line 63 of "rollout.py", relative code is

if self.args.alg == 'maven':
    action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
                                                       avail_action, epsilon, maven_z, evaluate)
else:
    action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
                                                       avail_action, epsilon, evaluate)

In else branch, it pass evaluate as the parameter maven_z for function choose_action. The correct code is

if self.args.alg == 'maven':
    action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
                                                       avail_action, epsilon, maven_z, evaluate)
else:
    action = self.agents.choose_action(obs[agent_id], last_action[agent_id], agent_id,
                                                       avail_action, epsilon, evaluate=evaluate)

After correcting it, the evaluation of policy gradient tends to be extremely unstable. In line 109 of "agent.py",

if epsilon == 0 and evaluate:
    action = torch.argmax(prob)
else:
    action = Categorical(prob).sample().long()

I think it a mistake for taking argmax of prob when doing evaluation, because policy gradient is learning the probability of the policy $\pi$. We should also sample it, just use the code below

action = Categorical(prob).sample().long()

I have tried and it truly works!

starry-sky6688 commented 2 years ago

Great advice! I have deleted evaluated in choose_action(),if evaluated = True in RollouterWorker,then it will set epsilon = 0.