Closed ewanlee closed 5 years ago
Thanks for your interest! The reward plotted in tensorboard is an average per timestep, while the plots in the paper are an average per episode. Sorry for the confusion. You can get the numbers presented in the paper by multiplying the values in tensorboard by the number of timesteps per episode (100).
I understand, thank you very much for your reply!
Hello, thank you very much for being able to open source the code for this paper. This is a very good job!
When running this code, for the Cooperative Treasure Collection multi-agent environment, my results are as follows:
These results are quite different from the average reward in the paper, which is about 100, and I have not changed any parameters. Is there any special place in the calculation method of average reward?