shariqiqbal2810 / MAAC

Code for "Actor-Attention-Critic for Multi-Agent Reinforcement Learning" ICML 2019
MIT License
666 stars 172 forks source link

About the results #2

Closed ewanlee closed 5 years ago

ewanlee commented 5 years ago

Hello, thank you very much for being able to open source the code for this paper. This is a very good job!

When running this code, for the Cooperative Treasure Collection multi-agent environment, my results are as follows:

image

These results are quite different from the average reward in the paper, which is about 100, and I have not changed any parameters. Is there any special place in the calculation method of average reward?

shariqiqbal2810 commented 5 years ago

Thanks for your interest! The reward plotted in tensorboard is an average per timestep, while the plots in the paper are an average per episode. Sorry for the confusion. You can get the numbers presented in the paper by multiplying the values in tensorboard by the number of timesteps per episode (100).

ewanlee commented 5 years ago

I understand, thank you very much for your reply!