hans-ekbrand / lc0-match

GNU General Public License v3.0
1 stars 0 forks source link

Don't send moves as bestmove if they have less than a threshold of visits #4

Open hans-ekbrand opened 5 years ago

hans-ekbrand commented 5 years ago

Occasionally the raw NN-eval is really bad, e.g. Leela 32306 in this position:

position startpos moves e2e4 e7e5 g1f3 b8c6 f1b5 a7a6 b5a4 d7d6 c2c3 f7f5 e4f5 c8f5 e1g1 f5d3 f1e1 f8e7 a4c2 d3c2 d1c2 g8f6 d2d4 d8d7 d4e5 c6e5 f3e5 d6e5 b1d2 e8c8 h2h3 h8e8 e1e5 e7c5 e5e8 d8e8 d2f3 f6e4 f3d4 c5d4 c3d4 d7d4 c1e3 d4e5 a1e1 e4f6 e3d2 e5d5 e1e8 f6e8 c2a4 d5d2 a4e8 d2d8 e8e6 c8b8 e6e5 d8d7 f2f4 h7h5 g1f2 d7d2 f2g3 h5h4 g3h4 d2g2 f4f5 b7b6 h4h5 a6a5 h3h4 b8b7 a2a4 b7b8 b2b3 b8b7 e5e7 g2g3 e7g5 g3b3 g5g7 b6b5 f5f6 b5a4 f6f7 b3f3 h5h6 f3f4 h6h7 f4h4 g7h6 h4e4 h6g6 e4h1 h7g7 a4a3 g6b6

In a game, lc0-match lost because it sent the move leading to this position as bestmove after 1 visit (no subnodes at all):


0321 00:52:23.328634 140375646459648 ../../src/mcts_replace/search.cc:355] === Move stats:
0321 00:52:23.328660 140375646459648 ../../src/mcts_replace/search.cc:356] f7f8q (1837) N:     164 (+164) (P: 43.15%) (Q:  0.15806) (U: 0.15806) (Q+U:  0.31613) (V:  -.----) 
0321 00:52:23.328680 140375646459648 ../../src/mcts_replace/search.cc:356] g6c2  (1328) N:      53 (+53) (P: 13.27%) (Q:  0.08933) (U: 0.08933) (Q+U:  0.17866) (V:  -.----) 
0321 00:52:23.328689 140375646459648 ../../src/mcts_replace/search.cc:356] g6e6  (1344) N:      58 (+58) (P: 11.87%) (Q:  0.11488) (U: 0.11488) (Q+U:  0.22976) (V:  -.----) 
0321 00:52:23.328697 140375646459648 ../../src/mcts_replace/search.cc:356] g6d3  (1330) N:      49 (+49) (P: 11.30%) (Q:  0.11008) (U: 0.11008) (Q+U:  0.22016) (V:  -.----) 
0321 00:52:23.328706 140375646459648 ../../src/mcts_replace/search.cc:356] g6f5  (1337) N:      37 (+37) (P:  7.29%) (Q:  0.19244) (U: 0.19244) (Q+U:  0.38488) (V:  -.----) 
0321 00:52:23.328714 140375646459648 ../../src/mcts_replace/search.cc:356] f7f8r (1838) N:      32 (+32) (P:  7.00%) (Q: -0.27248) (U: -0.27248) (Q+U: -0.54497) (V:  -.----) 
0321 00:52:23.328722 140375646459648 ../../src/mcts_replace/search.cc:356] g6f6  (1345) N:      14 (+14) (P:  1.93%) (Q:  0.10987) (U: 0.10987) (Q+U:  0.21973) (V:  -.----) 
0321 00:52:23.328730 140375646459648 ../../src/mcts_replace/search.cc:356] g6g5  (1338) N:    2721 (+2721) (P:  0.64%) (Q:  0.39144) (U: 0.39144) (Q+U:  0.78287) (V:  -.----) 
0321 00:52:23.328737 140375646459648 ../../src/mcts_replace/search.cc:356] g7g8  (1572) N:       5 (+ 5) (P:  0.47%) (Q:  0.03439) (U: 0.03439) (Q+U:  0.06878) (V:  -.----) 
0321 00:52:23.328745 140375646459648 ../../src/mcts_replace/search.cc:356] g6b1  (1326) N:      10 (+10) (P:  0.43%) (Q: -0.41795) (U: -0.41795) (Q+U: -0.83590) (V:  -.----) 
0321 00:52:23.328753 140375646459648 ../../src/mcts_replace/search.cc:356] g6g4  (1334) N:       6 (+ 6) (P:  0.39%) (Q:  0.05635) (U: 0.05635) (Q+U:  0.11270) (V:  -.----) 
0321 00:52:23.328761 140375646459648 ../../src/mcts_replace/search.cc:356] g6g3  (1331) N:       6 (+ 6) (P:  0.39%) (Q:  0.27388) (U: 0.27388) (Q+U:  0.54776) (V:  -.----) 
0321 00:52:23.328769 140375646459648 ../../src/mcts_replace/search.cc:356] f7f8b (1839) N:       4 (+ 4) (P:  0.31%) (Q: -0.70517) (U: -0.70517) (Q+U: -1.41034) (V:  -.----) 
0321 00:52:23.328777 140375646459648 ../../src/mcts_replace/search.cc:356] g6e4  (1332) N:       4 (+ 4) (P:  0.21%) (Q: -0.65308) (U: -0.65308) (Q+U: -1.30616) (V:  -.----) 
0321 00:52:23.328784 140375646459648 ../../src/mcts_replace/search.cc:356] g6h7  (1350) N:       4 (+ 4) (P:  0.20%) (Q: -0.00180) (U: -0.00180) (Q+U: -0.00359) (V:  -.----) 
0321 00:52:23.328792 140375646459648 ../../src/mcts_replace/search.cc:356] g6h6  (1346) N:       2 (+ 2) (P:  0.20%) (Q:  0.07331) (U: 0.07331) (Q+U:  0.14661) (V:  -.----) 
0321 00:52:23.328800 140375646459648 ../../src/mcts_replace/search.cc:356] g7f6  (1560) N:       2 (+ 2) (P:  0.17%) (Q: -0.28903) (U: -0.28903) (Q+U: -0.57806) (V:  -.----) 
0321 00:52:23.328808 140375646459648 ../../src/mcts_replace/search.cc:356] g6h5  (1339) N:       3 (+ 3) (P:  0.13%) (Q: -0.69640) (U: -0.69640) (Q+U: -1.39280) (V:  -.----) 
0321 00:52:23.328816 140375646459648 ../../src/mcts_replace/search.cc:356] g6a6  (1340) N:       2 (+ 2) (P:  0.10%) (Q: -0.21956) (U: -0.21956) (Q+U: -0.43912) (V:  -.----) 
0321 00:52:23.328823 140375646459648 ../../src/mcts_replace/search.cc:356] g7f8  (1571) N:       2 (+ 2) (P:  0.09%) (Q: -0.25899) (U: -0.25899) (Q+U: -0.51798) (V:  -.----) 
0321 00:52:23.328831 140375646459648 ../../src/mcts_replace/search.cc:356] g6b6  (1341) N:       1 (+ 1) (P:  0.08%) (Q:  0.41163) (U: 0.41163) (Q+U:  0.82326) (V:  -.----) 
0321 00:52:23.328839 140375646459648 ../../src/mcts_replace/search.cc:356] g6d6  (1343) N:       1 (+ 1) (P:  0.08%) (Q: -0.24608) (U: -0.24608) (Q+U: -0.49216) (V:  -.----) 
0321 00:52:23.328846 140375646459648 ../../src/mcts_replace/search.cc:356] g6g2  (1329) N:       2 (+ 2) (P:  0.08%) (Q: -0.67605) (U: -0.67605) (Q+U: -1.35209) (V:  -.----) 
0321 00:52:23.328854 140375646459648 ../../src/mcts_replace/search.cc:356] f7f8n (1544) N:       1 (+ 1) (P:  0.08%) (Q: -0.61137) (U: -0.61137) (Q+U: -1.22275) (V:  -.----) 
0321 00:52:23.328861 140375646459648 ../../src/mcts_replace/search.cc:356] g6c6  (1342) N:       2 (+ 2) (P:  0.07%) (Q: -0.44484) (U: -0.44484) (Q+U: -0.88967) (V:  -.----) 
0321 00:52:23.328873 140375646459648 ../../src/mcts_replace/search.cc:356] g6g1  (1327) N:       2 (+ 2) (P:  0.06%) (Q: -0.95310) (U: -0.95310) (Q+U: -1.90621) (V:  -.----) 
0321 00:52:23.328885 140375646459648 ../../src/chess/uciloop.cc:218] << bestmove g6b6```

This should be rare, but even so, we should have some mechanism that says veto when bestmove has too few visits, i.e. when the uncertainty of Q for that move is too high. I think a simple absolute threshold would do the job, e.g. filter out moves with less than 20 visits when selecting bestmove.
hans-ekbrand commented 5 years ago

For ponder and UCI-Info, we don't care about any threshold.