Open aramcb opened 1 year ago
You are right about the provisional ratings, the dataset doesn't mark provisional rating with a question mark. This was also something I wondered about, since there wasn't a foolproof way to identify a provisional rating based on the individual data points. The way I ended up interpreting a provisional rating game during the analysis was when the ratingdiff (rating gain/loss after the game) was greater than 30. The rating gain should get smaller as more games are played. For example: https://github.com/jcw024/lichess_database_ETL/blob/main/analytics/psycopg2_query.py#L185 You could set the cutoff higher/lower if you prefer, thinking back I think it would have been better to set it lower, maybe to 10 or 15. I think I didn't want to exclude rated games between players with a large rating gap where the lower rated player wins against the higher rated player, but those situations are probably extremely rare.
. The rating gain [RatingDiff] should get smaller as more games are played.
Also: According to Lichess FAQ:
[Provisional]...it means that the Glicko-2 deviation is greater than 11
PS: thanks for this very cool analysis! I'm awed by the work you put into assembling the relational database {I'm limiting my own analysis to one year as the setup overhead looks very daunting!}
I think there's a pretty good explanation for how the rating diff changes over time/games under another Lichess FAQ here.
I'm glad you enjoyed the analysis, it's been over a year since I worked on this, so happy people are still finding it useful/interesting. I think the main hurdle is the setup, but once it's setup it's the same whether you download 1 year or multiple years. It just depends on how much storage you have on your machine and how long you want to wait for it to download/process everything. More recent years have a lot more data than earlier years and will take more memory/time to process.
I couldn't tell from the code how this was defined. Have I missed something?
From what I can tell from lichess' open database files, they do not indicate presence or absence of provisional, i.e., "?" rating.