Open mokemokechicken opened 6 years ago
There are lots of information here.
I have started a recent run. This is using ggp-zero (reversi-alpha-zero implementation was inspiration!). ggp-zero is a generic implementation of a 'zero' method, and can train many different games. ie By zero I mean start with a random network and train via self play using (PUCT or variant) MCTS. However, at this point the implementation (and goals) are very divergent from AlphaZero (I also drew inspiration from 'Thinking Fast and Slow'). For this run, I am running with multiple policies and multiple value heads, with no turn flipping of the network and no symmetry/rotation of the network.
A previous run achieved approximately ntest level 7, however there were no records. Will keep up to date this time!
@gooooloo what's mean ntest:6 result (6/1/3) for step-418500?
@apollo-time in my report, ntest:6 means the opponent is NTest with strengh 6, 6/1/3 means 6 wins, 1 draw, 3 lose. By the way, you can find the win/draw game saves in https://github.com/gooooloo/reversi-alpha-zero-models/tree/master/ggf )
@gooooloo thanks reply. I see your model is very good. My model can't beat ntest depth 5 now. How about policy and value loss? Mine is (0.15, 0.1) now.
@apollo-time
How about policy and value loss?
policy loss : 0.4 value loss : 0.4-0.55. Unstable.
@gooooloo Um...But you use game history, isn't it?
@apollo-time
Um...But you use game history, isn't it?
Yes. I am actually guessing that, the resnet blocks number in the model should be reduced depends on the historical board number: Less historical boards, shallower the model should be. There is bigger chance to overfit if network's input space is large. I sometimes observe strange, bad move from my model, right after several strong moves. That I cannot understand. Overfitting could be a explain, yet I cannot make sure.
I finished up my latest run, ending up somewhere between ntest 5-10, depending on the phase of the moon. Not too shabby. The policy loss was about 2.0 and value loss is 0.14 - I found the above numbers very interesting. @gooooloo - congrats on creating such a strong model. Can you try ntest level 7 again with your latest model, I would love to see it getting beat? :)
@richemslie congrats on getting the result. ntest5 is already strong, I feel.
Can you try ntest level 7 again with your latest model
I have detailed evaluation metric here. The model beats ntest6 6/1/3 is step-418500+400sim, it loses to ntest7... But it beats ntest9 with 7/0/3. It seems to me that winning ntest:x doesn't imply winning ntext:(x-1) or ntest:(x-2)...
@gooooloo I see your player gametreee and see you subtract virtual loss after all simulations(800). Why do you not use parallel_search_num and is it ok?
@apollo-time
subtract virtual loss after all simulations(800)
Do you mean I should subtract virtual loss every simulation, not until all sims finished? That's what I do inside backup(). My function naming is not accurate though, will refine them later...
Why do you not use parallel_search_num and is it ok?
I just change the implementation to single thread way 3 days ago. I think the efficiency is not changed and codes are simpler to read, and I can also use it for time control (@mokemokechicken pointed it out the coroutine way can also do time control, as in this issue ).
Besides, I am using prediction_queue_size
and its value is same with parallel_search_num
in my config.
Yet my model was most trained from this version of GameTree implementation. You check it either.
@gooooloo Oh, I see you subtract virtual loss inside backup(). I am checking now why my model can't beat ntest and windows reversi though low policy and value loss. I'll check your previous version. thanks.
Report: my step-473800 model with 30 minutes per game time control beats ntest lv7-10, draw lv11, beats lv12-13, lose to lv14( 2 draws, 2 loses). Game model can be found here, evaluation game savings can be found here
Why use 30 min? Actually I am targeting 5 min C++ implementation on a 8-core CPU, 1-GPU machine. But I don't have a C++ implementation yet, so I try current python implementation with 30 min. I believe they will be similar in simulation_number_per_move, which is about 13000. And I believe 5 minutes of C++ implementation is reasonable for evaluating AI's length.
Report: I am getting Evaluator result as below. Many draws since last generation model. That is interesting. In this case, even if I am lucky to get a new model later with 10 more wins than lose, I am hard to be convinced that is a better model; I will regard it as just some randomness.
I think maybe this is exactly the reason DeepMind doesn't use AlphaGoZero way for Chess and Shogi. Yes they mention that reason in the AlphaZero paper, but I think I am experiencing it now by myself ^^
Will change to AlphaZero way soon...
486600-steps: 189 wins, 9 draws, 202 loses, -208.0 elo, < threshold(150). 488200-steps: 191 wins, 7 draws, 202 loses, -176.0 elo, < threshold(150). 489000-steps: 195 wins, 4 draws, 201 loses, -96.0 elo, < threshold(150). 490600-steps: 192 wins, 9 draws, 199 loses, -112.0 elo, < threshold(150). 491400-steps: 192 wins, 7 draws, 201 loses, -144.0 elo, < threshold(150). 492200-steps: 191 wins, 3 draws, 206 loses, -240.0 elo, < threshold(150). 493800-steps: 195 wins, 8 draws, 197 loses, -32.0 elo, < threshold(150). 494600-steps: 190 wins, 4 draws, 206 loses, -256.0 elo, < threshold(150). 496200-steps: 193 wins, 4 draws, 203 loses, -160.0 elo, < threshold(150). 497000-steps: 212 wins, 3 draws, 185 loses, 432.0 elo, >= threshold(150). 497800-steps: 152 wins, 71 draws, 177 loses, -400.0 elo, < threshold(150). 499400-steps: 169 wins, 60 draws, 171 loses, -32.0 elo, < threshold(150). 500200-steps: 176 wins, 44 draws, 180 loses, -64.0 elo, < threshold(150). 501000-steps: 159 wins, 61 draws, 180 loses, -336.0 elo, < threshold(150). 502600-steps: 176 wins, 44 draws, 180 loses, -64.0 elo, < threshold(150). 503400-steps: 174 wins, 51 draws, 175 loses, -16.0 elo, < threshold(150). 505000-steps: 166 wins, 62 draws, 172 loses, -96.0 elo, < threshold(150). 505800-steps: 161 wins, 55 draws, 184 loses, -368.0 elo, < threshold(150). 506600-steps: 179 wins, 43 draws, 178 loses, 16.0 elo, < threshold(150). 508200-steps: 167 wins, 53 draws, 180 loses, -208.0 elo, < threshold(150). 509000-steps: 170 wins, 62 draws, 168 loses, 32.0 elo, < threshold(150). 509800-steps: 168 wins, 59 draws, 173 loses, -80.0 elo, < threshold(150). 513000-steps: 166 wins, 55 draws, 179 loses, -208.0 elo, < threshold(150). 513800-steps: 176 wins, 63 draws, 161 loses, 240.0 elo, >= threshold(150). 515400-steps: 87 wins, 213 draws, 100 loses, -208.0 elo, < threshold(150). 516200-steps: 97 wins, 202 draws, 101 loses, -64.0 elo, < threshold(150). 517800-steps: 91 wins, 193 draws, 116 loses, -400.0 elo, < threshold(150). 518600-steps: 91 wins, 212 draws, 97 loses, -96.0 elo, < threshold(150). 520200-steps: 89 wins, 216 draws, 95 loses, -96.0 elo, < threshold(150).
@gooooloo That's interesting. In my self play, there was a time when the draw was very much(more than 50%). There was also a time when black's winning percentage was more than 50%, and vice versa.
I think one of the advantage of AlphaZero is to change models always. However, one of the disadvantage is that it is easy to overfit the training data and to become weak. So I feel that "training/self-play ratio" of #38 is very important.
Wondering which is currently the strongest model. Is Challenge 1 still the strongest? Also, sh ./download_model.sh 2
seems to save to data/model/model_best_*
, but config.py (and in README) seems to expect it in /data/model/next_generation/*
Hi - new record for gzero (of a different kind - playing equal to ntest level 3 after only 12 hours of training). Discovered a pretty bad bug with PUCT constants this morning, which is the reason for a new run.
Some points:
@richemslie Your implementation using C++ looks very interesting. I'm curious how many self-play games per second or minute you can generate. By taking into account your small architecture, it may be possible to compare your speed with that of Python-only implementations.
@AranKomat - it is hard to give exact numbers for comparison, but using a batch size of 1024 on 1080ti card, it can be 100% saturated with 2 c++ threads. I am seeing 20480 model evaluations, and hence for ~60 moves at 800 evaluations per move, that is 2.3 seconds per game. For the tiny (initial) network (8 res, 64 filters,64 hidden), it took 3 c++ threads to saturate and was about twice as fast (1.1 seconds). The second card is throttled to prevent overheating it (adding a sleep into my optimised c++ code was painful!) - it roughly works out to 14 seconds per game. Note that the reversi game is defined in a prolog like language, and then interpreted... so if it was a custom reversi implementation in c++ it would be much faster, and it wouldn't require so many threads to saturate the GPU.
@mokemokechicken achieved 22 seconds per game with 256 filters, a single 1080ti and half as many sims/move. I assume it would take 22s per game with two 1080ti's and 800 sims/move. Though arch FLOPS is (256/96)^2~7 times larger, GPU and TF scales weirdly. So, if I assume yours would take 2.3x3 s per game with the same architecture as moke's, the speedup of using C++ is probably 22/(2.3x3)~3.3 times? I'm looking forward to your updates!
@fuzzthink
I am sorry to have confused you.
Wondering which is currently the strongest model. Is Challenge 1 still the strongest?
Now, "challenge 5 model" and "ch5 config" are strongest in my models.
Also, sh ./download_model.sh 2 seems to save to data/model/modelbest, but config.py (and in README) seems to expect it in /data/model/next_generation/
Please remove( or rename) data/model/next_generation/
directory if you want to use "BestModel" at data/model/model_best_*
.
For example,
rm -rf data/model/next_generation/
sh ./download_model.sh 5
# run as wxPython GUI
python src/reversi_zero/run.py play_gui --type ch5
If you want to use as a NBoard engine, please use nboard_engine --type ch5
for the Command.
Please share your reversi model achievement! Models that are not reversi-alpha-zero are also welcome. Battle record, configuration, repository url, comments and so on.