shariqiqbal2810 / MAAC

Code for "Actor-Attention-Critic for Multi-Agent Reinforcement Learning" ICML 2019
MIT License
645 stars 169 forks source link

About the reporduction of experiment Cooperative Treasure Collection #15

Closed zwfightzw closed 4 years ago

zwfightzw commented 4 years ago

Thank for sharing the source code of MAAC. This is a very interesting papar. When i reproduce the experiments, the result of Cooperative Treasure Collection is quite different from the paper's. The parameter of episode_length is 100, and the source code statistics are the average of 100 steps for each agent. def get_average_rewards(self, N): if self.filled_i == self.max_steps: inds = np.arange(self.curr_i - N, self.curr_i) # allow for negative indexing else: inds = np.arange(max(0, self.curr_i - N), self.curr_i) return [self.rew_buffs[i][inds].sum() for i in range(self.num_agents)] Therefore, i sum the values of each agent to get the result shown in the figure. results So, i want to know how to calculate the results in original paper! Hope for you reply!!!

Wei Zhou, zhouwei14@nudt.edu.cn

shariqiqbal2810 commented 4 years ago

Hi,

Sorry for the confusion. The rewards reported in the paper are the average across agents and summed over timesteps. So if you take the rewards you calculated to be named rews. The rewards reported in the paper are (rews / nagents) * nsteps