shindavid / AlphaZeroArcade

8 stars 1 forks source link

Root Policy Softmax Temperature #35

Closed shindavid closed 1 year ago

shindavid commented 1 year ago

The original KataGo paper used a constant root softmax temperature of 1.03, which is currently the default in our implementation. The subsequent blog post, however, describes something else:

In KataGo's g170 run, this temperature was 1.25 for the early game, decaying exponentially to 1.1 for the rest of the game with a halflife in turns of the board dimensions

We should experiment with this alternative parameterization.

Note that the original idea comes from an academic group's replication of AlphaGo. The rationale of the idea and the experimental evidence in favor of it are described here.

shindavid commented 1 year ago

David Wu in a subsequent conversation explained that KataGo now uses an initial root policy softmax temp of 1.4, decaying it to something lower (1.1?) over the course of the game.

This has been implemented.