openai / weak-to-strong

MIT License
2.51k stars 308 forks source link

Preprocessed Chess Puzzle Data #10

Open y12uc231 opened 11 months ago

y12uc231 commented 11 months ago

Hi,

I was trying to reproduce the results for the chess puzzle dataset and it seems like the original dataset was preprocessed to convert FEN positions to a set of moves. But there can be multiple set of moves to reach a specific board position. Is it possible for you to share the preprocessing script or the preprocessed data used in the experiments.

Thanks, Satya

SecDante commented 11 months ago

Thank

y12uc231 commented 11 months ago

Hi all,

Bumping this up in case there is anything I am missing or if there is any other info needed from my end. Appreciate helping with this.

-Satya

WuTheFWasThat commented 11 months ago

i believe the data for the sequence of moves exists somewhere, @pavel-izmailov would know details

pavel-izmailov commented 11 months ago

Hey @y12uc231, the original data from lichess is indeed in FEN notation, but also each puzzle is extracted from a real game. You can find a database of puzzles as a csv here. Each entry should contain a game id from which the position was extracted. Then, you can use the lichess api to extract the game from its id, and convert it to a move sequence notation.