Closed gwario closed 1 year ago
Hmm maybe an obvious reason, but what is wrong with having a gamma and ternary rewards? Isn't it just discounting the z with a factor in either direction? and that's just fine? Thx for picking it up so quickly though
I think I did not correctly document or name ternary_rewards
. The intent was rather “games that give no intermediate reward and then only a final reward that can be either -1, 0 or 1”. As implemented, TernaryStatistics
does not make sense in the presence of intermediate rewards. Also, even in the absence of intermediate rewards, having gamma != 1
can be useful indeed to incentivise short wins.
I think the solution here should be to replace the ternary_rewards
parameter by a ternary_outcome
parameter. When this parameter is set to true, win-draw-loss statistics can be computed by assuming that a total reward of 0 is a draw, a positive reward a win and a negative reward a loss. This solution is more elegant, more general and agnostic about gamma.
Thank you for clarifying... Now, looking at the reward propagation towards the beginning of the game, I see the issue with intermediary rewards and gamma...
Hi! I discovered an issue with ternary outcome statistics for duels. When gamma is set to something different than the default of 1.0, an assertion when creating the ternary outcome statistics fails (https://github.com/jonathan-laurent/AlphaZero.jl/blob/master/src/benchmark.jl#L111). I'm not sure how to deal with it...
Maybe not using gamma for the evaluation report (https://github.com/jonathan-laurent/AlphaZero.jl/blob/66eaed8e4d8f60f8d535d949e5447b5c5f821ee8/src/benchmark.jl#L96) or changing the assertion to
or maybe removing the assertion altogether..
Regards