Closed AndrewSpano closed 1 year ago
This PR includes an extra commit from another PR (GPU implementation of Connect Four).
Otherwise, it looks good! I was thinking of a simpler approach, such as simply printing an error when the user specifies ternary_rewards
with gamma!=1
but this approach is interesting too.
Will create another PR that contains only the changes of the latest commit, closing this one.
This PR resolves #177 by changing the value of
gamma
when used in therewards_and_redudancy()
function for environments that have ternary rewards:https://github.com/jonathan-laurent/AlphaZero.jl/blob/66eaed8e4d8f60f8d535d949e5447b5c5f821ee8/src/simulations.jl#L292
Specifically, the changes that have been made are:
In the
run()
function of benchmark.jl, the following line was produced to check if the current environment has ternary rewards, and if yes thengamma == 1
is used for thereport.Evaluation
rewards:gamma = env.params.ternary_rewards ? 1. : env.params.self_play.mcts.gamma
In training.jl, the functions
pit_networks()
evaluate_network()
compare_networks()
now take the extra argument
eval_gamma
that is used to compute the rewards inrewards_and_redudancy()
. This value is computed inlearning_step!()
, beforecompare_networks()
is invoked: