Closed wenshuaizhao closed 1 year ago
the win rate is the winner episodes / all episodes, so we only need to take the final score reward of one episode.
From the code of gfootball the info['score_reward'] = score_reward
score_reward = reward
, and the reward is defined as reward = score_diff - self._state.previous_score_diff
. It seems the final score reward is just the reward of the last step when the episode is done? How can it be the final goal difference?
Hope to get your further explanation. Thanks!
Hi, Now I see that in academy scenarios, the episode will end after one score. So, it is ok to use the final step score_reward to indicate the winning. I don't have any questions anymore. Thanks!
Hi, Thanks for the contribution. I am not so clear about the calculation of the 'eval_win_rates' of GRF in the code. It seems now it defines the win as 1 if the 'score_reward' is positive.
However, if I understand it correctly, the env_info['score_reward'] is just the step reward, not the accumulated goal difference. I think I miss something, could you help me figure it out?
Thanks!