Solve Hex! and figure out hyperparams

poja / Cattus

Cattus is a chess engine based on DeepMind AlphaZero paper, written in Rust. It uses a neural network to evaluate positions, and MCTS as a search algorithm.

4 stars 0 forks source link

Solve Hex! and figure out hyperparams #29

Open barakugav opened 2 years ago

barakugav commented 2 years ago

Train a two headed network until the engine wins against us consistently. Understand how long such training requires and with what hyper parameters?: learning rate games generated in self play temperature for softmax, for how many moves should we use softmax model structure should we take a single position from each game or more

a lot of this can be taken from lc0

poja commented 1 year ago

What is your intuition about this initial attempt? https://github.com/poja/RL/blob/smaller-hex-2/train/config.json

barakugav commented 1 year ago

I think the ratio of selfplay to training is too high. In each iteration we will play 100 games with ~60 positions and 1000 simulations per move, 6,000,000 network calculation in total. If we generate 6000 positions each iterations, i think we should train at least on 20000 entries, and we can choose them from the latest 100000 entries.

In general, the self play is much more computational heavy, and we want to get the most from each data entry, so lets train A LOT!

poja commented 1 year ago

I want to think more about this - but meanwhile see the attached results with the above config. Notice how at some point the learning stops (in policy before value). And the last row seems to me "lucky" in the value-loss, i.e. the next row wouldn't necessarily be as good.

Also, importantly, this is 4x4 hex, so this somewhat affects the numbers you mentioned and the intuition (but is almost the same order of magnitude).

221025_175846.txt

barakugav commented 1 year ago

If the learning stops so fast, maybe our learning rate is too high. What do you think about 10^-3 in the first 50 iterations and 10^-4 in the other 50?

Also, policy accuracy of 0.3 is not terriable when you have 16 options, but i suspect the network is very limitied. Would you like to try the ConvNetV1 instead?

And another point about the number of training entries: if latest_data_entries=1000 this basiclly says you only learn from the last iteration data, i really think we should either increase latest_data_entries and iteration_data_entries or decrease games_num to ~10 and do 1000 iterations

barakugav commented 1 year ago

Nevertheless, this is super exciting! cant wait to play against it