Closed akopf82 closed 3 years ago
Hi,
I understand this is confusing, for two player games, the total reward
is the sum of the rewards over the entire game. at tic tac toe there is necessarily a winner receiving a reward of 10. This explains what we see.
On the other hand, the Muzero reward
and opponent reward
plots are interesting, I have the impression that muzero wins a little more frequently towards the end of your run.
The total reward of Connct4 iterates to 10 and keeps constant afterwards. Even after 100k training steps the game performance seems ramdomized. [
](url)