official-stockfish / Stockfish

A free and strong UCI chess engine
https://stockfishchess.org/
GNU General Public License v3.0
11.76k stars 2.3k forks source link

2moves_v1.pgn improvement #1853

Closed FauziAkram closed 4 years ago

FauziAkram commented 5 years ago

A running project to improve 2moves_v1.pgn, removing the "bad" starting positions, which can be a position with a forced drawing line since the first move, or an extremely one-sided position. This is the first patch, which contains 37 positions to be removed, and more patches might come if this first patch gets approved.

Discussion: https://groups.google.com/forum/?fromgroups=#!topic/fishcooking/cO5bF2_a6Ow Sheet: https://1drv.ms/x/s!AujF4uRoZmV9gmYIFJgGk7CGvdCt 2moves_v2a.pgn (The new pgn, after removing the first patch of "bad" position): https://1drv.ms/u/s!AujF4uRoZmV9gmhe7wdKD-2zP9en

GuardianRM commented 5 years ago

Good job. It seems to me that the next step could be the replacement of some of the most popular positions from 2 moves to positions from 3 moves.

Then it will be necessary to test Stockfish 10 against Stockfish 3 (or maybe Stockfish 4 to identify one-way positions)

Possible classic start to increase the book (the popular positions are chosen from 2 moves, where black collects at least 40% of points from the correspondent database. The approximate number of games is indicated in brackets):

1) e4-e5 2) Nf3-Nc6 (561k) 1) e4-e5 2) Nf3-Nf6 (93k) 1) e4-e5 2) Nf3-d6 (82k)

1) e4-e5 2) Bc4-Nf6 (35k) 1) e4-e5 2) f4-ef (45k) 1) e4-e5 2) d4-ed (26k) 1) e4-e5 2) Nc3-Nf6 (22k)

1) e4-c5 2) Nf3-d6 (389k) 1) e4-c5 2) Nf3-Nc6 (199k) 1) e4-c5 2) Nf3-e6 (91k) 1) e4-c5 2) Nf3-g6 (15k) 1) e4-c5 2) Nf3-a6 (8k)

1) e4-c5 2) Nc3-Nc6 (36k) 1) e4-c5 2) Nc3-d6 (14k) 1) e4-c5 2) c3-Nf6 (20k) 1) e4-c5 2) c3-d5 (14k) 1) e4-c5 2) Bc4-e6 (15k) 1) e4-c5 2) Bc4-Nc6 (10k) 1) e4-c5 2) d4-cd ​​(27k) 1) e4-c5 2) f4-NC6 (5k)

etc.

For example from the classic beginning 1) e4-e5 2) Nf3-Nc6 You can get an additional about 500 positions with an approximate estimate to "?!"

There may be some one-sidedness when playing in certain openings, but given the popularity this should not be a problem, because we additionally update the book 2_moves.pgn with the most popular options from the theoretical 3_moves.pgn

Below ~ 200 positions are collected manually (which is about 30-40% of the possible) of 1) e4-e5 2) Nf3-Nc6 Questionable positions are indicated by a "?".

GuardianRM commented 5 years ago

Spanish

r1bqkbnr/1ppp1ppp/p1n5/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkb1r/pppp1ppp/2n2n2/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/ppp2ppp/2np4/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqk1nr/pppp1ppp/2n5/1Bb1p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp2pp/2n5/1B2pp2/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1ppp/8/1B2p3/3nP3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkb1r/ppppnppp/2n5/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1b1kbnr/pppp1ppp/2n2q2/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1p1p/2n3p1/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqk1nr/pppp1ppp/2nb4/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4

r1bqkbnr/pppp2pp/2n2p2/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 ?

r1bqk1nr/pppp1ppp/2n5/1B2p3/1b2P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/ppp2ppp/2n5/1B1pp3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqk1nr/ppppbppp/2n5/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1pp1/2n4p/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1b1kbnr/ppppqppp/2n5/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/1ppp1ppp/2n5/pB2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/p1pp1ppp/1pn5/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4

r1bqkbnr/pppp1ppp/8/1B2p3/1n2P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 ?

r1bqkbnr/ppppnppp/8/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 ?

r1bqkbnr/pppp1ppp/8/nB2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 rnbqkbnr/pppp1ppp/8/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4

r1bqkbnr/pppp1p1p/2n5/1B2p1p1/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 ?

r1bqkb1r/pppp1ppp/2n4n/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1pp1/2n5/1B2p2p/4P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 1rbqkbnr/pppp1ppp/2n5/1B2p3/4P3/5N2/PPPP1PPP/RNBQK2R w KQk - 0 4

Italian

r1bqkb1r/pppp1ppp/2n2n2/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqk1nr/pppp1ppp/2n5/2b1p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1pp1/2n4p/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqk1nr/ppppbppp/2n5/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/ppp2ppp/2np4/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1ppp/8/4p3/2BnP3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/1ppp1ppp/p1n5/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp2pp/2n5/4pp2/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkb1r/pppp1ppp/2n4n/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1b1kbnr/ppppqppp/2n5/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1ppp/8/n3p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1p1p/2n3p1/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqk1nr/pppp1ppp/2nb4/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqk1nr/pppp1ppp/2n5/4p3/1bB1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/ppp2ppp/2n5/3pp3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkb1r/ppppnppp/2n5/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/p1pp1ppp/1pn5/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/1ppp1ppp/2n5/p3p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1pp1/2n5/4p2p/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/p1pp1ppp/2n5/1p2p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1p1p/2n5/4p1p1/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 r1bqkbnr/pppp1ppp/8/4p3/1nB1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4

rnbqkbnr/pppp1ppp/8/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 ?

1rbqkbnr/pppp1ppp/2n5/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQk - 0 4

r1bqkbnr/ppppnppp/8/4p3/2B1P3/5N2/PPPP1PPP/RNBQK2R w KQkq - 0 4 ?

Scotch

r1bqkbnr/pppp1ppp/2n5/8/3pP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1bqkbnr/ppp2ppp/2np4/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1bqk1nr/pppp1ppp/2nb4/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1bqkb1r/pppp1ppp/2n2n2/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1bqkbnr/ppp2ppp/2n5/3pp3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1bqkbnr/pppp2pp/2n2p2/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1bqkbnr/pppp1ppp/8/4p3/3nP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1b1kbnr/pppp1ppp/2n2q2/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1bqk1nr/pppp1ppp/2n5/4p3/1b1PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1b1kbnr/ppppqppp/2n5/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1bqkbnr/pppp2pp/2n5/4pp2/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 r1bqkbnr/pppp1pp1/2n4p/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4

r1bqk1nr/ppppbppp/2n5/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 ?

r1bqkbnr/pppp1p1p/2n3p1/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4

r1bqkbnr/p1pp1ppp/1pn5/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 ?

r1bqkbnr/pppp1ppp/8/4p3/1n1PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 ?

r1bqkb1r/pppp1ppp/2n4n/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 ?

r1bqkbnr/ppppnppp/8/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 ?

rnbqkbnr/pppp1ppp/8/4p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 ?

r1bqkbnr/p1pp1ppp/2n5/1p2p3/3PP3/5N2/PPP2PPP/RNBQKB1R w KQkq - 0 4 ?

GuardianRM commented 5 years ago

3-4 Knights

r1bqkb1r / pppp1ppp / 2n2n2 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2n5 / 2b1p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2np4 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkbnr / 1ppp1ppp / p1n5 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkbnr / pppp1p1p / 2n3p1 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n5 / 4pp2 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqk1nr / ppppbppp / 2n5 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1b1kbnr / pppp1ppp / 2n2q2 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n2p2 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkbnr / pppp1pp1 / 2n4p / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkb1r / ppppnppp / 2n5 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2nb4 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkbnr / p1pp1ppp / 1pn5 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkbnr / pppp1ppp / 8 / 4p3 / 3nP3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2n5 / 3pp3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1b1kbnr / ppppqppp / 2n5 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkbnr / 1ppp1ppp / 2n5 / p3p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4 r1bqkb1r / pppp1ppp / 2n4n / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4

r1bqkbnr / pppp1p1p / 2n5 / 4p1p1 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4?

r1bqkbnr / pppp1pp1 / 2n5 / 4p2p / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4

r1bqkbnr / pppp1ppp / 8 / 4p3 / 1n2P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4?

r1bqkbnr / pppp1ppp / 8 / n3p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4?

r1bqkbnr / p1pp1ppp / 2n5 / 1p2p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQkq - 0 4?

1rbqkbnr / pppp1ppp / 2n5 / 4p3 / 4P3 / 2N2N2 / PPPP1PPP / R1BQKB1R w KQk - 0 4

Ponziani opening

r1bqkb1r / pppp1ppp / 2n2n2 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2n5 / 3pp3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2n5 / 2b1p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2np4 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / 1ppp1ppp / p1n5 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n5 / 4pp2 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkb1r / ppppnppp / 2n5 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / ppppbppp / 2n5 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1pp1 / 2n4p / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1p1p / 2n3p1 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2nb4 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1b1kbnr / pppp1ppp / 2n2q2 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n2p2 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / p1pp1ppp / 1pn5 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1b1kbnr / ppppqppp / 2n5 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4

r1bqkbnr / p1pp1ppp / 2n5 / 1p2p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4?

r1bqkbnr / 1ppp1ppp / 2n5 / p3p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1p1p / 2n5 / 4p1p1 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1pp1 / 2n5 / 4p2p / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkb1r / pppp1ppp / 2n4n / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4

r1bqkbnr / pppp1ppp / 8 / n3p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4?

r1bq1bnr / ppppkppp / 2n5 / 4p3 / 4P3 / 2P2N2 / PP1P1PPP / RNBQKB1R w KQ - 0 4?

King Pawn Opening (3) d3)

r1bqkb1r / pppp1ppp / 2n2n2 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2n5 / 2b1p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2np4 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1pp1 / 2n4p / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2n5 / 3pp3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n5 / 4pp2 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1p1p / 2n3p1 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2n5 / 4p3 / 1b2P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / ppppbppp / 2n5 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1b1kbnr / pppp1ppp / 2n2q2 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n2p2 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / 1ppp1ppp / p1n5 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkb1r / ppppnppp / 2n5 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1b1kbnr / ppppqppp / 2n5 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / p1pp1ppp / 1pn5 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1ppp / 8 / 4p3 / 3nP3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2nb4 / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1pp1 / 2n5 / 4p2p / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4 r1bqkb1r / pppp1ppp / 2n4n / 4p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4

r1bqkbnr / pppp1p1p / 2n5 / 4p1p1 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4?

r1bqkbnr / p1pp1ppp / 2n5 / 1p2p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4

r1bqkbnr / pppp1ppp / 8 / 4p3 / 1n2P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4?

r1bqkbnr / 1ppp1ppp / 2n5 / p3p3 / 4P3 / 3P1N2 / PPP2PPP / RNBQKB1R w KQkq - 0 4

GuardianRM commented 5 years ago

King Pawn Opening (3) a3)

r1bqkb1r / pppp1ppp / 2n2n2 / 4p3 / 4P3 / P4N2 / 1PPP1PP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2n5 / 2b1p3 / 4P3 / P4N2 / 1PPP1PP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2np4 / 4p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / 1ppp1ppp / p1n5 / 4p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2n5 / 3pp3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1pp1 / 2n4p / 4p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n5 / 4pp2 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / ppppbppp / 2n5 / 4p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / 1ppp1ppp / 2n5 / p3p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1p1p / 2n3p1 / 4p3 / 4P3 / P4N2 / 1PPP1PP / RNBQKB1R w KQkq - 0 4

r1bqkbnr / pppp2pp / 2n2p2 / 4p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4?

r1b1kbnr / pppp1ppp / 2n2q2 / 4p3 / 4P3 / P4N2 / 1PPP1PP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1ppp / 8 / 4p3 / 3nP3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / p1pp1ppp / 1pn5 / 4p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2nb4 / 4p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqkb1r / ppppnppp / 2n5 / 4p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqkb1r / pppp1ppp / 2n4n / 4p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4

r1bqkbnr / pppp1p1p / 2n5 / 4p1p1 / 4P3 / P4N2 / 1PPP1PP / RNBQKB1R w KQkq - 0 4?

r1b1kbnr / ppppqppp / 2n5 / 4p3 / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1pp1 / 2n5 / 4p2p / 4P3 / P4N2 / 1PPP1PPP / RNBQKB1R w KQkq - 0 4

r1bqkbnr / p1pp1ppp / 2n5 / 1p2p3 / 4P3 / P4N2 / 1PPP1PP / RNBQKB1R w KQkq - 0 4?

King Pawn Opening (3) Be2)

r1bqkb1r / pppp1ppp / 2n2n2 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2n5 / 2b1p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2np4 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2n5 / 3pp3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / pppp1pp1 / 2n4p / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / pppp1p1p / 2n3p1 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n5 / 4pp2 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkb1r / ppppnppp / 2n5 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqk1nr / ppppbppp / 2n5 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / 1ppp1ppp / p1n5 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1b1kbnr / pppp1ppp / 2n2q2 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n2p2 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2n5 / 4p3 / 1b2P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2nb4 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / pppp1ppp / 8 / 4p3 / 3nP3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / p1pp1ppp / 1pn5 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / pppp1pp1 / 2n5 / 4p2p / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkb1r / pppp1ppp / 2n4n / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1b1kbnr / ppppqppp / 2n5 / 4p3 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4 r1bqkbnr / pppp1p1p / 2n5 / 4p1p1 / 4P3 / 5N2 / PPPPBPPP / RNBQK2R w KQkq - 0 4

King Pawn Opening (3) c4

r1bqk1nr / pppp1ppp / 2n5 / 2b1p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkb1r / pppp1ppp / 2n2n2 / 4p3 / 2P1P3 / 5N2 / PP1P1PP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / ppp2ppp / 2np4 / 4p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / 1ppp1ppp / p1n5 / 4p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2n5 / 4p3 / 1bP1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1ppp / 8 / 4p3 / 2PnP3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1p1p / 2n3p1 / 4p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / p1pp1ppp / 1pn5 / 4p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n5 / 4pp2 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp1pp1 / 2n4p / 4p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1b1kbnr / pppp1ppp / 2n2q2 / 4p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkbnr / pppp2pp / 2n2p2 / 4p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqkb1r / ppppnppp / 2n5 / 4p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / pppp1ppp / 2nb4 / 4p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4 r1bqk1nr / ppppbppp / 2n5 / 4p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4

r1bqkbnr / ppp2ppp / 2n5 / 3pp3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4?

r1bqkbnr / 1ppp1ppp / 2n5 / p3p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4

r1bqkbnr / pppp1ppp / 8 / n3p3 / 2P1P3 / 5N2 / PP1P1PPP / RNBQKB1R w KQkq - 0 4?

r1bqkbnr / pppp1p1p / 2n5 / 4p1p1 / 2P1P3 / 5N2 / PP1P1PP / RNBQKB1R w KQkq - 0 4

...

The book may be updated gradually. Maybe someone can do it not manually.

noobpwnftw commented 5 years ago

How about this? https://groups.google.com/forum/#!topic/fishcooking/I0sEZollSlY

GuardianRM commented 5 years ago

@noobpwnftw

Some time ago I had thoughts on how to get a conditional chess solution. (I think this reasoning was not only mine), you can read a bit here https://github.com/glinscott/leela-chess/issues/482 https://github.com/glinscott/leela-chess/issues/489

(also you can read some discussion from post ~ 274 to ~ 298) http://immortalchess.net/forum/showthread.php?t=9&page=14

In any case, any person playing chess by correspondence appreciates position statistics, based on a lot of games or ratings. Currently there are gaming bases in the public domain. and 7 figure endgame tables. But it is rather strange that there is still no conditional decision of chess in the general access, although in theory this is enough for about 100 million games with further refinement.

if your program is potentially capable of doing something like or collect statistics with specified parameters or a book with a given number of positions from the starting one according to certain parameters, This would be a wonderful and good reference both for the game and for the further development of chess in general.

noobpwnftw commented 5 years ago

@GuardianRM My program does exactly that, but it requires significant amount of computations to overcome many kinds of bias(distribution of moves, imperfect eval, etc).

GuardianRM commented 5 years ago

@noobpwnftw

If I understand correctly, the idea is to sort the options based on the evaluation function. This may indeed require a very large number of calculations. Even if the sorting is based on the evaluation function of Stockfish Dev, this is an almost unimaginable task, since the evaluation function is still not perfect. With an increase in the depth of calculation, significant assessment fluctuations depending on the initial position estimates. Also if we analyze the database like LetsCheck, it is almost impossible to draw any definite conclusions. It seems to me, that a more accurate approach might be to play a database with a little randomization at the beginning of the game to eliminate duplicate positions. Perhaps the most correct theoretical solution there may be a lot of Stockfish official vs Stockfish master games on small time controls (for example 10s + 0.1s) with a periodic update of statistics, to select the line for subsequent games. If this feature is present or may be present in any case, it deserves attention. If possible, download the program for playing games and sending data to the server may be worth paying attention to the project of people who are engaged lighting chess on Youtube. If the project will be without additional technical difficulties for understanding by the audience, for example, it will always be clear that the batched games will update the database on the server and update conditionally Stockfish Perfect Book and everyone will be able to see the result of the process after each game or certain amount games, I think there will be a lot of followers of the idea, at least it should be. The ideal option may be an additional opportunity to select a starting position for playing statistics.

noobpwnftw commented 5 years ago

@GuardianRM Yes, it is based on weighted average of moves scored by depth 20 search, governed by a sample window decided dynamically by the best score. Also the move exploration is done by a slightly modified version of SF with parameters given that basically guarantees a branch factor of at least 5, see here: https://github.com/noobpwnftw/Stockfish/tree/siever

There are three kinds of calculations:

  1. A normal SF that scores the moves at depth 20.
  2. A siever SF that picks new move in case of a new position.
  3. Any engine that supports UCI to play matches against the database then self-play, upload move discovery.

De-duplication is done at server, and it always tries to give similar positions to the same worker, so local TT can be useful, and yes the database is constantly updated upon receiving results and can be used as a book in real time. I have done some effort to make the probing and data storage very fast and efficient.

Problem with WDL% statistics is that you will always get biased number of samples, distribution of those biases decides the final %, while straight engine evaluation sampled from many children is more accurate and can serve the same meaning, in principle they are different things but in practice I just convert it from 100 / ( 1 + exp( -score / 330 ).

GuardianRM commented 5 years ago

@noobpwnftw

If I play correspondence chess, basically I watch the first 2-3 most popular moves according to % statistics in the current version (most often they coincide with the most popular ones). Then I move a step forward and look at possible options in the same way. To select the next move, usually 2-4 steps are enough, the move is selected based on the best final % of the leaves.

Usually small manually busting the tree enough, since it is almost impossible to predict the opponent's move. This approach is used until about 100 positions remain, then I start using the engine in an infinite mode.

I have no special methods for calculating the conditional score, but I can also say that this method is much more efficient than using the LetsCheck ratings until a certain point.

The main problem is the insufficient number of games, since in some variants the catch mark of 100 games starts quite early.

noobpwnftw commented 5 years ago

To automate such manual steps, leaves are aggregated to their roots via averaging, so that the score at root is no longer a minimax score of the best leaf but having the possibilities and alternative lines considered. In this way the final game outcome become less important, eval scores of a single leaf also become less important, all what is really important there is the tree structure. One can take the tree structure and re-evaluate the leaves with a different engine(I have tried once), yet the score distribution seen at root remains not much affected. If one have a very wide move sieving window, then it is very hard to shape the tree very differently since there are only that many "good" moves that really matters.

GuardianRM commented 5 years ago

Still, it seems to me that there is some trick when using infinite analysis and averaging estimates. At least most of the correspondence players try to use the min max as long as possible, since there is confidence in what percentage of points you can get from a certain position after a few moves. Those. for example, if there are positions with an average of 45% and a guaranteed 43% for blacks, but in the first case there is a risk of an intermediate move, where White gains 60-65%, then a safer option will be chosen that guarantees 43%, even if the average score and the probability of occurrence on the number of parties is less.

For example, if I play black Sicilian defense

  1. e4 c5 2. Nf3 d6 * (~ 389k 52.2% for whites)
  2. e4 c5 2. Nf3 Nc6 * (~ 199k 52.9% for whites)
  3. e4 c5 2. Nf3 e6 * (~ 91k 53.5% for whites)

I more often choose option 1. e4 c5 2. Nf3 e6 * as there are intermediate options where white get more guaranteed % advantage on the 6-8th move in intermediate cases than the odds that will be obtained in the third case

noobpwnftw commented 5 years ago

That is the reason why a weighed average is preferred, however how much weights to apply is unknown, I used some strange formula by my experience. It may not be immediately obvious for the scenario you mentioned, if it is only a few moves back and forth, but if the entire tree propagated in this way then it would become more obvious that the score distribution at root will have the statistics you are looking for. It is hard to write a "golden formula" for all scenarios or remove distribution bias without a huge number of samples chosen at random, so I now start building the tree and wait for someday it start to render something useful.

GuardianRM commented 5 years ago

It seems to me that only large samples of random strong games begin to shed light on the real assessment of the position, since the technique for choosing a move in the opening for evaluating the engine at a certain, even sufficiently large depth often does not work.

Offhand, the correlation of reliability is ~ 1/10 in favor of practical games until a certain moment, until the selection of games is completed, since the position estimate by the engines is still somewhat chaotic.

The same position in the opening can be evaluated by StockfishDev within +- 0.3-0.5 after some time. Example position from practice:

rnbqkb1r / ppp2ppp / 5p2 / 3P4 / 5B2 / 8 / PPP2PPP / R2QKBNR b KQkq - 0 7

Different versions of StockfishDev for the last six months estimate it indefinitely at the same depth, for example, within a depth of ~35-45 with a sufficiently high +- error.

But there are almost no games to evaluate the position in terms of WDL probabilities.

noobpwnftw commented 5 years ago

I have also a tool for such problem, in which people can input their desired starting position, then let engines self-play while updating the database in order to explore/refute known moves until the samples grow sufficient enough to draw conclusions of that position.

There actually is nothing wrong with engines favoring certain moves on most positions, given that if it thinks the move is "sufficiently" better, in reality, this also happens. Problem is in order for an engine to search deep under normal time controls, it must prune a lot of moves, while in a database we do not care about time, so we can force it to prune less and exploring moves that are considered worse.

Let's say for example, normally SF searches 20% of the possible moves, then we can explore 50% or something, while 100% means perft. It is also hard to come up with a ratio but what we can do is to not waste the calculations done already by saving them into a big database.

GuardianRM commented 5 years ago

To be honest, I almost never play statistic games in current position, as there are play not too many correspondence games, because of the need for quite a lot of time for an acceptable game, but I occasionally use step-by-step analysis in LetsCheck. Nevertheless, if I had the opportunity to play statistics from games for a certain position and it was kept like LetsCheck, I think that I would use it periodically during the analysis, as a supplement to the infinite analysis.

noobpwnftw commented 5 years ago

My suggestion is that we could make a fork like BrainFish, having the database(book) functions built-in and run a cloud service for it. Although most of the functions should go to GUI or other front-end applications but they can be best incorporated if they are built into the engine itself. Also there are a lot of fine-tuning of the parameters to work on the database side as well as data mining from the results.

mcostalba commented 5 years ago

Do you know this?

https://www.sp-cc.de/drawkiller-openings.htm

ghost commented 5 years ago

I'm using drawkiller_tournament.epd for blitz testing it really does reduces draws significantly, compared to other books, however it gets somewhat different results from 2moves book.

snicolet commented 5 years ago

@mcostalba Yes, I think it would be interesting to at least try to integrate the draw-killer book on fishtest and run some tests with it to compare with our current other books.

mcostalba commented 5 years ago

@snicolet from the point of view of fishtest, a good book is the one that maximizes the test resolution.

So to measure the effectiveness of a book in a sound way we could pick a set of patches (green and red ones) and run the SPRT tests 2 times, with the default 2 moves book and with the new candidate book.

The best book is the one that minimize the total number of games.

Unfortunately, In the above scenario it is difficult to pick a sensible number of patches (5, 10, 50, ?). Maybe a mathematician would help here.

A simpler setup would be to run a test with a fixed number of games (say 20000) against master, and compute the squares of the ELO results, the book with biggest score wins.

For instance if we pick 5 green patches and test against master and the results are:

book 1: +2 ELO, +1.5, +2, +1, +1    = 14,25
book 2: +2 ELO, +1, +1.2, +1, +0.9 =   8,46

Then book 1 is the best for testing.

I am not sure how many patches we need, I would say out of my hat that 10 patches could be enough.

dorzechowski commented 5 years ago

Resolution of an opening book can be measured by normalized Elo, see: http://hardy.uhasselt.be/Toga/normalized_elo.pdf So we could get two close SF versions (as we typically have 1-2 Elo patches) and measure it with fixed number of games on different TC.

There are some caveats about drawkiller openings though. They will never test typical opening phase as both kings are always manually castled and stay on opposite corners on big diagonals (a1-h8 or h1-a8), pieces are shuffled on 1st line and a lot of pawns are moved. So for example any patches related to castling or pieces trapped by uncastled king, getting king to safety, etc. will not be possible to be reliably tested. Moreover drawkiller_tournament.epd contains only around 6800 positions, may be enough but I'm not sure. I think it's probably a good book for rating lists but not necessarily for testing patches.

ghost commented 5 years ago

@dorzechowski There is drawkiller_big with nearly 30k positions. Its possible to make a hybrid 2moves+drawkiller book to add these positions.

miguel-l commented 5 years ago

Just a random idea: do we even need a book? We can get some randomness by enabling some "skill" option for the first few plies of a game and let the engine wander into some opening by itself. I would think this avoids significantly one sided openings and to me it seems more natural, although I'm not sure if it's a good idea with regards to test resolution.

dorzechowski commented 5 years ago

I did some measurements with books 2moves_v1 and drawkiller_normal. TC 10+0.1, SF 010119 vs SF 291118 (the same as SF 10), no TB, no adjudication (except draw after 250 moves).

Games Completed = 10000 of 10000 (Avg game length = 15.532 sec)
Settings = Gauntlet/32MB/1000ms+100ms/M 1000cp for 100 moves, D 250 moves/EPD:drawkiller_normal.epd(13318)
Time = 158229 sec elapsed, 0 sec remaining
 1.  Stockfish 010119           5079.5/10000    3351-3192-3457      (L: m=3192 t=0 i=0 a=0) (D: r=1537 i=1250 f=600 s=61 a=9)   (tpm=103.5 d=21.08 nps=2921662)
 2.  Stockfish 291118           4920.5/10000    3192-3351-3457      (L: m=3351 t=0 i=0 a=0) (D: r=1537 i=1250 f=600 s=61 a=9)   (tpm=103.2 d=20.73 nps=2982810)

Games Completed = 10000 of 10000 (Avg game length = 18.199 sec)
Settings = Gauntlet/32MB/1000ms+100ms/M 1000cp for 100 moves, D 250 moves/EPD:2moves_v1.epd(32000)
Time = 184913 sec elapsed, 0 sec remaining
 1.  Stockfish 010119           5136.5/10000    2762-2489-4749      (L: m=2489 t=0 i=0 a=0) (D: r=1992 i=1697 f=974 s=69 a=17)  (tpm=103.3 d=21.16 nps=2935270)
 2.  Stockfish 291118           4863.5/10000    2489-2762-4749      (L: m=2762 t=0 i=0 a=0) (D: r=1992 i=1697 f=974 s=69 a=17)  (tpm=103.2 d=20.78 nps=3011682)

Even though drawkiller book lowers draw ratio significantly, the resolution is worse:

vdbergh commented 5 years ago

Even though drawkiller book lowers draw ratio significantly, the resolution is worse:

Note that by itself, lowering the draw ratio is bad for resolution (it is obvious that converting a draw into a win or loss, with 50% probability each, just increases noise: you can achieve the same result by simply tossing a coin). So lowering the draw ratio should be offset by an increase in elo (the precise statement is that elo/sqrt(1-d) should remain at least constant). This is what happens for the 2 moves versus the 8 moves book for example.

I have now understood this in a quite formal way. Resolution (or sensitivity, the terminology is not completely fixed) should be thought off as the amount of effort it takes to separate two engines with given Type I/II error probabilities (i.e the expected length of a test). Of course this effort also depends on the chosen engines but the idea is that the relative effort between two testing setups should only depend weakly on the chosen engines (I call this the weak dependency hypothesis).

This is worked out in the new section 5 of my working document

http://hardy.uhasselt.be/Toga/normalized_elo.pdf

I define the notion of a context of a test (e.g. an opening book) and then the concept of relative sensitivity between two contexts.

Under the weak dependency hypothesis it is shown in Theorem 5.1.4 that the relative amount of effort it takes to separate two engines under two different contexts is inversely quadratic in the relative sensitivity of the contexts.

Here is an easy example. There were regression tests of sf9->sf10 using the 2 moves and the 8 moves books. The outcomes were

W,L,D=9754,3612,26634 # LTC test (sf9->sf10) with 8 moves book.
W,L,D=12041,4583,23376 # LTC test (sf9->sf10) with 2 moves book

A simple computation then shows that the relative sensitivity (approximately elo1*sqrt(1-d2)/(elo2*sqrt(1-d1))) of the 2 moves book versus the 8 moves book is with 95% confidence in the interval [1.04, 1.15].

In other words the assumption that the 2 moves book is better than the 8 moves book for fishtest appears to be correct. This test appears to show that the reduction in games achievable by using the 2 moves book (at LTC), without sacrificing power, would be between 8% and 24% (using the above mentioned fact that the relative effort is inversely quadratic in the relative sensitivity).

I expressly wrote would be since there are some obvious caveats in interpreting these results. Since fishtest uses the 2 moves book for testing patches there may be a form of selection bias going on. Patches that work well with the 2 moves book are more likely to make it into master, possibly inflating (normalized) elo when measured with the 2 moves book.

The conclusion also depends on the weak dependency hypothesis and it is unknown (at least to me) how reasonable this hypothesis is when applied to engines as different as sf9 and sf10 versus engines that typically differ by a single small patch. Unfortunately relative sensitivity is extremely hard to measure accurately with engines that are very close in elo. As one can see in the above example, even with engines that differ by about 60 elo and a measurement consisting of 40000 games one still obtains an unpleasantly large confidence interval for the relative sensitivity of the 2 moves book versus the 8 moves book.

vdbergh commented 5 years ago

Here is another example of relative sensitivity: contempt versus no contempt. Here are results of two regression tests

W,L,D=11182,5189,23629 # sf9->sf10, STC, no contempt, 2 moves book, 4Mb hash
W,L,D=14256,6190,19554 # sf9->sf10, STC, contempt, 2 moves book, 4Mb hash

We see that contempt increases elo (good for sensitivity) and decreases the draw ratio (bad for sensitivity). Since these are two opposing effects one has to carefully weigh them.

The 95% confidence interval for the relative sensitivity of contempt versus no contempt (measured with the above data) is [1.16,1.28] which amounts to a saving in games between 25% and 40% (with the same caveats as above). I am not a fan of contempt but this calculation is evidence that it's use on the framework may be justified.

EDIT. Note that I took care to do the no contempt test with 4Mb hash. Recently the default hash size for STC was increased to 8Mb (without proper justification AFAICS).

ghost commented 5 years ago

@vdbergh The hash size was increased because 8mb is stronger vs 4mb. https://github.com/glinscott/fishtest/issues/336#event-2049617786 This test is the justification: LLR: 2.96 (-2.94,2.94) [0.00,4.00] Total: 18116 W: 4089 L: 3853 D: 10174 LLR 2.96 [-2.94,2.94] (accepted) Elo 4.07 [0.55,7.51] (95%) LOS 98.8% Games 18116 [w:22.6%, l:21.3%, d:56.2%] http://tests.stockfishchess.org/tests/view/5c0652900ebc5902bcee542b

vdbergh commented 5 years ago

Well to objectively measure the effect on sensitivity one would have to assume both engines use 8Mb... A change in testing conditions which raises the base line for all contestants by a meager 4 elo is unlikely to have any substantial effect on the functioning of the framework.

Now I agree the change is likely to be harmless so it does not matter much. But it messes up things if one wants to compare newer results with older results which were obtained with 4Mb.

NKONSTANTAKIS commented 5 years ago

The way to increase resolution is not to lower drawrate, but to lower the level of determinism in result. Worst possible are openings which lead to 100% win or 100% draw, equally bad.

Ideal would be a position scoring 33% 33% 33%. Such a position would be so sensitive that changed parameters in a wide variety of topics would show. Such positions could (and should appear multiple times in the book)

For versatility we can use all known good books and cut out everything below a threshold, ie the most dominant outcome should be <50%. Then we will have a super book.

The results that interest us for editing are only statistics of SFdev vs SFdev, so we could just playout 100 times every opening and make our selection. We can be that frugal in the number of playouts because noise is not a concern, it will just lead to inclusion of some worse rated ones, ie 60% ones or a few 70% ones, something negligible for the overall efficiency.

MichaelB7 commented 5 years ago

@NKONSTANTAKIS - Good post - your comment is 100% correct. Positions that are sensitive and can go either way. Good topic as well for the team to discus. So thanks to all for your individual contributions to this discussion.

Alayan-stk-2 commented 5 years ago

The way to increase resolution is not to lower drawrate, but to lower the level of determinism in result. Worst possible are openings which lead to 100% win or 100% draw, equally bad.

Agreed.

There is quite a bunch of very one sided positions, and for the purpose of finding if a patch helps or not those are just wasted games. If a patch is bad enough to noticeably lower winrate in those positions, it will be destroyed in more balanced positions anyway.

Another concern is tuning. Tuning is already an expensive process, but garbage openings are polluting the process by making the changes due to variable changes harder to detect, and there is the (hard to quantify) concern that values are dragged towards being better at very dubious openings (where at more decent TC than fishtesting, SF is good enough). Follow-ups on the most popular 2 moves variation would likely deserve additional weight in the testing process.

MJZ1977 commented 5 years ago

My personel opinion about actual opening book is that a lot of positions are far from normal chess playing and this can impact the tuned values. One test we can do is to tune some parameters with a different opening book closer to normal chess (we can take TCEC one ?) and see if these parameters change a lot or not.

vdbergh commented 5 years ago

@Alayan-stk-2 Biases in opening positions, as long as they are not extreme, are in fact quite innocent if the pentanomial model is used. Under the pentanomial model the bias will cancel out. Below a graph (constructed using the BayesElo model) of sensitivity versus bias for drawelo in {220,280,315}={STC,LTC,VLTC}.

Bias versus sensitivity

So reasonable amounts of bias are even beneficial when the draw ratio is high. This fact was already observed long ago by Kai Laskos on TalkChess.

Alayan-stk-2 commented 5 years ago

@Alayan-stk-2 Biases in opening positions, as long as they are not extreme, are in fact quite innocent if the pentanomial model is used.

"As long as they are not extreme" is an important part.

A 50% win 40% draw 10% loss opening is biased but not extremely so. A position which is too quiet has the risk of being too drawish in testing.

But a say 90% win 10% draw opening is extremely so. We're talking about positions where SF3 could get wins over ~SF10 at 20+0.2. I don't know of a way to provide a custom book at fishtest, but we could do some more detailed tests with positions like this : rnbqkbnr/ppp1pp1p/6p1/3p4/8/7N/PPPPPPPP/RNBQKBR1 w Qkq -

Now, I'm not sure as where to draw the line to define what is too biased and what is not, but I don't think a position like the FEN above will allow to pick up on any SF improvement or regression that won't be picked by saner 2-movers.

And my concern for tuning and for insufficient weight of more solid openings remain regardless of SPRT resolution.

NKONSTANTAKIS commented 5 years ago

@vdbergh Just using draw ratio leaves out a crucial info, the % of the non draw outcome to white or black wins! 56% draw with 44% white win is very different to 22% 56% 22%. Those results can be attributed to low draw openings usually being one-sided, while low draw openings which equally produce both color wins are bound to be exceptional but rare. So yes, a 50% 50% 0% will be probably worse than a 15% 70% 15%, but possibly better than a 5% 90% 5%.

So, with these in mind, we can add a low % precondition to our high % one in filtering the book: ie Least expected outcome > 15% + highest < 70%.

The high resolution of the high drawrates can be attributed to the longer, non forced games, which give room for error for both sides, especially at those low TC's. While a low drawrate is by far more likely to indicate that one side has a huge advantage (low res) than that the position is so lively that both sides can win it (high res). Indeed it makes a lot of sense that a 60% 20% 20% opening would be inferior to a 20% 60% 20%. Now we can add a 3rd requirement:

W(dominant) Win % < 40%

So with all 3 combined, the worst possible 1-side winning bias we can have is 40% 45% 15%, the worst drawish bias 15% 70% 15%, and the highest bipolarity 40% 20% 40%

So the possible ranges are: 15-40%, 20-70%, 15-40%

I regard beyond certainty that such a filtering will do wonders.

vdbergh commented 5 years ago

@NKONSTANTAKIS The draw ratio in the graph is for a position without bias. The graph was made using the BayesElo model

win=L(elo-drawelo+bias)
loss=L(-elo-drawelo-bias)
draw=1-win-loss

where L is the logistic function L(x)=(1/(1+10^(-x/400)). Here elo is expressed in BayesElo. I took elo=0.1 since one is only interested in relative sensitivity (bias versus no bias). I also used the pentanomial model where the games in a game pair are combined to calculate the variance of the score (the code for this is ready and running on the dev server, the only thing left to do is documenting it).

In the graph I have assumed that the bias is constant which is of course not true for an opening book. It would be interesting to see how different the result would be if we let the bias be variable. I have some reason to believe the RMS bias ( sqrt E(bias^2)) will be the dominating factor but for now this should be regarded as speculation.

dorzechowski commented 5 years ago

@vdbergh What would be the correct method of estimating/measuring the bias of a book?

MJZ1977 commented 5 years ago

I am just repeating what somebody said before, but for me the % for draw is not the most important. The most important is to have positions close to normal chess. We can have a french or king's indian position with +1 for white if we need a biaised opening position.

vdbergh commented 5 years ago

@dorzechowski Thank you for the interesting question. I will write a reply (I have some data on the 2moves book), but I have been busy the last couple of days fighting with some upcoming deadlines.

dorzechowski commented 5 years ago

I just published on fishtest my attempt to improve the 2moves_v1 book: https://groups.google.com/d/msg/fishcooking/cO5bF2_a6Ow/7A7xoIY5BQAJ

vdbergh commented 5 years ago

@dorzechowski Test running :)

http://dfts-0.pigazzini.it/tests/stats/5c6e5079ded32276bb0cf32c

(see the last line).

Note that there is no confidence interval (there is not really enough information to compute one, although one can make some back of the envelope estimations). One has to wait for several 10's of 1000's of games before the result can be considered to be somewhat reliable.

dorzechowski commented 5 years ago

@vdbergh That's great, I'm already curious about conclusions. Could you make the script that calculates all those things available? I would do my own measurements as well.

vdbergh commented 5 years ago

@dorzechowski It should be available soon I hope. The idea is that fishtest will convert to pentanomial statistics which are more accurate and result in a saving of 10%-15% in games. The code for this (which requires both worker and server changes) is ready and has been running on the dev server for a while, apparently without any issues. The only thing that is left to do is to write proper documentation for the formulas that are used (I am currently doing this, but this and next week I will be very busy).

The page I linked to is basically an audit page. The bias calculation was added as an afterthought when I noticed that the difference between the pentanomial and trinomial variance is given by E(bias^2) (I call this an "accounting identity" since it amounts to collecting terms in a sum in two different ways). Currently the RMS bias (sqrt E(bias^2)) stands at about 75 elo. We have

E(bias^2)=E(bias)^2+Variance(bias).

Unfortunately there is not enough information to calculate the individual terms (the worker would have to send more information to do this, but it is too late now to make such changes). It is however a good guess that E(bias) is just the side to move bias which should be something like 30 elo.

Then we find Var(bias)=75**2-30**2=4727 and hence sigma(bias)=68.7. So assuming the bias is normally distributed (which probably does not quite hold) we find that with 95% certainty the bias of individual positions would be between 30-2*68.7=-107 and 30+2*68.7=167. So it seems the filtering has done its job (the 75% score cutoff corresponds to 190 elo).

Note: I have very extensive statistics on the original 2moves book (obtained in a different way) but I have to dig those up.

Note: I also plan to add sensitivity to the audit page.

Alayan-stk-2 commented 5 years ago

The test for 2moves_v2 has been completed with 40K games, probably for a few days already. What now ?

vdbergh commented 5 years ago

@Alayan-stk-2 Do you mean these tests?

http://dfts-0.pigazzini.it/tests/stats/5c78fd23ded322106fdb9a65 http://dfts-0.pigazzini.it/tests/stats/5c7fa612ded32226389058fa

The first tests measures the sensitivity of 2moves_v2.pgn (0.31563 [0.30508, 0.32619] (95%)) using 40000 games of sf10 vs sf9. The second test does a similar measurement for 2moves_v1.pgn. It is still ongoing.

Note that even if 2moves_v1.pgn comes out decisively better then this may still be due to selection bias (the patches that make up the difference between sf9 and sf10 were selected based on tests with the 2moves_v1 book, so it would be natural if these patches work better with the 2moves_v1 book). Selection bias seems to be very hard to rule out.

dorzechowski commented 5 years ago

In my limited tests (see below) 2moves_v2 sensitivity looks not worse than 2moves_v1, although error bars overlap. 2moves_v2 gives more draws but also fewer 1-0, 1-0 results and in the end SNR seems fine. Looking forward to some conclusions from @vdbergh tests above, very interesting stats collected there. Btw, it would be interesting to split result 0.5 in pentanomial distribution into drawish 1/2-1/2, 1/2-1/2 and lopsided for one colour 1-0, 1-0 (or 0-1, 0-1). Doesn't change calculations but I think it's a nice thing to know.

For example (TC 10+0.1):

2moves_v1
 1.  Stockfish 080219           1041.5/2000  501-418-1081
 2.  Stockfish 10                958.5/2000  418-501-1081

2moves_v2
 1.  Stockfish 080219           1046.0/2000  464-372-1164
 2.  Stockfish 10                954.0/2000  372-464-1164
vdbergh commented 5 years ago

@dorzechowski

The tests

http://dfts-0.pigazzini.it/tests/stats/5c78fd23ded322106fdb9a65 http://dfts-0.pigazzini.it/tests/stats/5c7fa612ded32226389058fa

are both finished. At 10+0.1 there is indeed no discernible difference in resolution between 2moves_v1.pgn and 2moves_v2.pgn (measured with sf10 vs sf9). As expected 2moves_v2.pgn has less bias.

MJZ1977 commented 5 years ago

So, can we change the open book quickly or do we need more tests ?

Alayan-stk-2 commented 5 years ago

What's the status for this ?