Closed zwfightzw closed 4 years ago
Hi,
Sorry for the confusion. The rewards reported in the paper are the average across agents and summed over timesteps. So if you take the rewards you calculated to be named rews
. The rewards reported in the paper are (rews / nagents) * nsteps
Thank for sharing the source code of MAAC. This is a very interesting papar. When i reproduce the experiments, the result of Cooperative Treasure Collection is quite different from the paper's. The parameter of episode_length is 100, and the source code statistics are the average of 100 steps for each agent. def get_average_rewards(self, N): if self.filled_i == self.max_steps: inds = np.arange(self.curr_i - N, self.curr_i) # allow for negative indexing else: inds = np.arange(max(0, self.curr_i - N), self.curr_i) return [self.rew_buffs[i][inds].sum() for i in range(self.num_agents)] Therefore, i sum the values of each agent to get the result shown in the figure. So, i want to know how to calculate the results in original paper! Hope for you reply!!!
Wei Zhou, zhouwei14@nudt.edu.cn