openai / multiagent-particle-envs

Code for a multi-agent particle environment used in the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
https://arxiv.org/pdf/1706.02275.pdf
MIT License
2.33k stars 785 forks source link

Wrong reward in simple speaker listener #97

Open StevenYuan666 opened 2 years ago

StevenYuan666 commented 2 years ago
def reward(self, agent, world):
    # squared distance from listener to landmark
    a = world.agents[0]
    dist2 = np.sum(np.square(a.goal_a.state.p_pos - a.goal_b.state.p_pos))
    return -dist2

But world.agents[0] is the speaker

StevenYuan666 commented 2 years ago

Never mind. In reset_world, they set the goal a and goal b respectively.

want listener to go to the goal landmark

    world.agents[0].goal_a = world.agents[1]
    world.agents[0].goal_b = np.random.choice(world.landmarks)