fix: Ternary Statistics computation

AndrewSpano commented 1 year ago

This PR resolves #177 by changing the value of gamma when used in the rewards_and_redudancy() function for environments that have ternary rewards:

https://github.com/jonathan-laurent/AlphaZero.jl/blob/66eaed8e4d8f60f8d535d949e5447b5c5f821ee8/src/simulations.jl#L292

Specifically, the changes that have been made are:

In the run() function of benchmark.jl, the following line was produced to check if the current environment has ternary rewards, and if yes then gamma == 1 is used for the report.Evaluation rewards:

gamma = env.params.ternary_rewards ? 1. : env.params.self_play.mcts.gamma
In training.jl, the functions
now take the extra argument eval_gamma that is used to compute the rewards in rewards_and_redudancy(). This value is computed in learning_step!(), before compare_networks() is invoked:
```
  eval_gamma = env.params.ternary_rewards ? 1. : env.params.self_play.mcts.gamma
  eval_report =
    compare_networks(env.gspec, env.curnn, env.bestnn, ap, handler, eval_gamma)
```

jonathan-laurent commented 1 year ago

This PR includes an extra commit from another PR (GPU implementation of Connect Four). Otherwise, it looks good! I was thinking of a simpler approach, such as simply printing an error when the user specifies ternary_rewards with gamma!=1 but this approach is interesting too.

AndrewSpano commented 1 year ago

Will create another PR that contains only the changes of the latest commit, closing this one.

jonathan-laurent / AlphaZero.jl

fix: Ternary Statistics computation #182