marlbenchmark / on-policy

This is the official implementation of Multi-Agent PPO (MAPPO).
https://sites.google.com/view/mappo
MIT License
1.27k stars 292 forks source link

gfootball win_rate calculation #66

Closed wenshuaizhao closed 1 year ago

wenshuaizhao commented 1 year ago

Hi, Thanks for the contribution. I am not so clear about the calculation of the 'eval_win_rates' of GRF in the code. It seems now it defines the win as 1 if the 'score_reward' is positive.

However, if I understand it correctly, the env_info['score_reward'] is just the step reward, not the accumulated goal difference. I think I miss something, could you help me figure it out?

Thanks!

zoeyuchao commented 1 year ago

the win rate is the winner episodes / all episodes, so we only need to take the final score reward of one episode.

wenshuaizhao commented 1 year ago

From the code of gfootball the info['score_reward'] = score_reward score_reward = reward , and the reward is defined as reward = score_diff - self._state.previous_score_diff. It seems the final score reward is just the reward of the last step when the episode is done? How can it be the final goal difference?

Hope to get your further explanation. Thanks!

wenshuaizhao commented 1 year ago

Hi, Now I see that in academy scenarios, the episode will end after one score. So, it is ok to use the final step score_reward to indicate the winning. I don't have any questions anymore. Thanks!