Open StevenYuan666 opened 2 years ago
def reward(self, agent, world): # squared distance from listener to landmark a = world.agents[0] dist2 = np.sum(np.square(a.goal_a.state.p_pos - a.goal_b.state.p_pos)) return -dist2
But world.agents[0] is the speaker
Never mind. In reset_world, they set the goal a and goal b respectively.
world.agents[0].goal_a = world.agents[1] world.agents[0].goal_b = np.random.choice(world.landmarks)
But world.agents[0] is the speaker