Open tmmlaarhoven opened 3 years ago
How about just basing the current batches on the dates games are finished rather than when they started? Wouldn't that just fix the issue? 🤔
If that fixes the issue for correspondence games while providing the same results for the rest, it may not be necessary to split the database.
Unfortunately there's no such database field. The time of the last move + game over could get close, but there's no index on that.
(migrated from https://github.com/ornicar/lila/issues/9721)
Regarding the databases on https://database.lichess.org/, most files for different months were generated long after the months were over, which meant that correspondence games started in that month had long finished.
However, with new databases now being generated shortly after the end of the month, the PGN databases now actually contain correspondence games which were still in progress and therefore had many moves missing.
An example of such a game from the July 2021 database: a half-way finished correspondence game which started some time in July, but finished some time in August/September after 60 moves (119 plies), as can be seen at https://lichess.org/TY9oxOqR :
For correspondence games, it probably makes sense to make separate databases for them, and export them in batches based on the dates the games finished rather than when they started. Otherwise there will always be lots of these unfinished games in these databases, and the full games will not appear in any subsequent databases either. (Or one would have to wait several months before generating the export, as correspondence games might still be in progress.)
So, maybe the nicest solution: separate correspondence games from the main "standard chess" database, and batch that separate database according to the dates the games finished, rather than started.
See also the corresponding Zulip discussion.