openai / multiagent-particle-envs

Code for a multi-agent particle environment used in the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
https://arxiv.org/pdf/1706.02275.pdf
MIT License
2.33k stars 785 forks source link

Reward Setting for simple_tag.py #60

Closed hiroignis closed 4 years ago

hiroignis commented 4 years ago

Hello,

I thank you OpenAI, amazing contributions, the papers and codes help my research work a lot. I wonder how come the rewards for adversaries in simple_tag.py are same among them while checking MADDPG working on multiagent-particle-envs. As far as I looked through, I guess it is a bug (not sure it is not supposed) at the reward function adversary_reward() in simple_tag.py (called from reward() <- _get_reward() of MultiAgentEnv in environment.py <- step() <- env.step()... for example). The bug is, each adversarial agent gets its own reward from other adversarial agents' reward, so every time their rewards are same, because do they share their reward together..? Please check the code below if need to fix.

The mentioned function in simple_tag.py is:

def adversary_reward(self, agent, world):
    # Adversaries are rewarded for collisions with agents
    rew = 0
    shape = False
    agents = self.good_agents(world)
    adversaries = self.adversaries(world)
    if shape:  # reward can optionally be shaped (decreased reward for increased distance from agents)
        for adv in adversaries:
            rew -= 0.1 * min([np.sqrt(np.sum(np.square(a.state.p_pos - adv.state.p_pos))) for a in agents])
    if agent.collide:
        for ag in agents:
            for adv in adversaries:
                if self.is_collision(ag, adv):
                    rew += 10
    return rew

then the last part:

    if agent.collide:
        for ag in agents:
            for adv in adversaries:
                if self.is_collision(ag, adv):
                    rew += 10
    return rew

should be:

    if agent.collide:
        for ag in agents:
            if self.is_collision(ag, adv):
                rew += 10
    return rew

With this fix, each agent, with MADDPG in simple_tag.py, gets each reward. Thanks.

Kimonili commented 4 years ago

Hi @hiroignis I also noticed that. I Would like to ask you something else as well. I have been training 4 adversaries and 1 good agent in the simple tag scenario, but while for the same steps the good agent recieves rewards like -10.0000, -9.7257, or -17.5386, -18.1665, -19.3938, -20.0000 the adversaries always recieve 0. I think that the result of -10 for the good agent is because of the collision with one of the adversaries, but for the same step in the environment no adversary recieves +10 for this collision, but as I mentioned it is always 0.

Is this normal?

Additionaly, the rewards of the good agent that are not a result of a collision (-10 and -20) are due to the penalty function for exiting the bounds of the environment?

Thank you in advance!