jonathan-laurent / AlphaZero.jl

A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.
https://jonathan-laurent.github.io/AlphaZero.jl/stable/
MIT License
1.23k stars 138 forks source link

Issue with ternary outcome statistics #177

Closed gwario closed 1 year ago

gwario commented 1 year ago

Hi! I discovered an issue with ternary outcome statistics for duels. When gamma is set to something different than the default of 1.0, an assertion when creating the ternary outcome statistics fails (https://github.com/jonathan-laurent/AlphaZero.jl/blob/master/src/benchmark.jl#L111). I'm not sure how to deal with it...

Maybe not using gamma for the evaluation report (https://github.com/jonathan-laurent/AlphaZero.jl/blob/66eaed8e4d8f60f8d535d949e5447b5c5f821ee8/src/benchmark.jl#L96) or changing the assertion to

num_won  = count(r -> 0<r<=1.0, rewards)
num_draw = count(==(0), rewards)
num_lost = count(r -> -1.0<=r<0, rewards) 

or maybe removing the assertion altogether..

Regards

gwario commented 1 year ago

Hmm maybe an obvious reason, but what is wrong with having a gamma and ternary rewards? Isn't it just discounting the z with a factor in either direction? and that's just fine? Thx for picking it up so quickly though

jonathan-laurent commented 1 year ago

I think I did not correctly document or name ternary_rewards. The intent was rather “games that give no intermediate reward and then only a final reward that can be either -1, 0 or 1”. As implemented, TernaryStatistics does not make sense in the presence of intermediate rewards. Also, even in the absence of intermediate rewards, having gamma != 1 can be useful indeed to incentivise short wins.

I think the solution here should be to replace the ternary_rewards parameter by a ternary_outcome parameter. When this parameter is set to true, win-draw-loss statistics can be computed by assuming that a total reward of 0 is a draw, a positive reward a win and a negative reward a loss. This solution is more elegant, more general and agnostic about gamma.

gwario commented 1 year ago

Thank you for clarifying... Now, looking at the reward propagation towards the beginning of the game, I see the issue with intermediary rewards and gamma...