Unfinished correspondence games

tmmlaarhoven commented 3 years ago

(migrated from https://github.com/ornicar/lila/issues/9721)

Regarding the databases on https://database.lichess.org/, most files for different months were generated long after the months were over, which meant that correspondence games started in that month had long finished.

However, with new databases now being generated shortly after the end of the month, the PGN databases now actually contain correspondence games which were still in progress and therefore had many moves missing.

An example of such a game from the July 2021 database: a half-way finished correspondence game which started some time in July, but finished some time in August/September after 60 moves (119 plies), as can be seen at https://lichess.org/TY9oxOqR :

[Event "Rated Correspondence game"]
[Site "https://lichess.org/TY9oxOqR"]
[Date "2021.07.17"]
[Round "-"]
[White "mahatma09"]
[Black "bishopdaniel"]
[Result "*"]
[UTCDate "2021.07.17"]
[UTCTime "15:30:24"]
[WhiteElo "1752"]
[BlackElo "1752"]
[ECO "D00"]
[Opening "Queen's Pawn Game: Chigorin Variation"]
[TimeControl "-"]
[Termination "Unterminated"]

1. d4 d5 2. Nc3 Nc6 3. Nf3 Nf6 4. Bg5 Bg4 5. e3 e6 6. a3 Be7 7. Bb5 O-O 8. Bxc6 bxc6 9. h3 Bxf3 10. Qxf3 h6 11. Bh4 Rb8 12. Rb1 c5 13. O-O cxd4 14. exd4 Qd6 15. Rfe1 Rbe8 16. Bg3 Qd7 17. a4 Bb4 18. Re2 Bxc3 19. Qxc3 Qxa4 20. Qxc7 Rc8 21. Qf4 Rxc2 22. Rxc2 Qxc2 23. Qc1 Rc8 24. Qxc2 Rxc2 25. b4 Ne4 26. Bb8 a6 27. f3 Nc3 28. Ra1 Ra2 29. Rxa2 *

For correspondence games, it probably makes sense to make separate databases for them, and export them in batches based on the dates the games finished rather than when they started. Otherwise there will always be lots of these unfinished games in these databases, and the full games will not appear in any subsequent databases either. (Or one would have to wait several months before generating the export, as correspondence games might still be in progress.)

So, maybe the nicest solution: separate correspondence games from the main "standard chess" database, and batch that separate database according to the dates the games finished, rather than started.

See also the corresponding Zulip discussion.

pepellou commented 2 years ago

How about just basing the current batches on the dates games are finished rather than when they started? Wouldn't that just fix the issue? 🤔

If that fixes the issue for correspondence games while providing the same results for the rest, it may not be necessary to split the database.

niklasf commented 2 years ago

Unfortunately there's no such database field. The time of the last move + game over could get close, but there's no index on that.

lichess-org / database

Unfinished correspondence games #37