How do you define "player's first stable rating"?

jcw024 / lichess_database_ETL

pipeline for migrating lichess data into postgresql

208 stars 9 forks source link

How do you define "player's first stable rating"? #2

Open aramcb opened 1 year ago

aramcb commented 1 year ago

I couldn't tell from the code how this was defined. Have I missed something?

From what I can tell from lichess' open database files, they do not indicate presence or absence of provisional, i.e., "?" rating.

jcw024 commented 1 year ago

You are right about the provisional ratings, the dataset doesn't mark provisional rating with a question mark. This was also something I wondered about, since there wasn't a foolproof way to identify a provisional rating based on the individual data points. The way I ended up interpreting a provisional rating game during the analysis was when the ratingdiff (rating gain/loss after the game) was greater than 30. The rating gain should get smaller as more games are played. For example: https://github.com/jcw024/lichess_database_ETL/blob/main/analytics/psycopg2_query.py#L185 You could set the cutoff higher/lower if you prefer, thinking back I think it would have been better to set it lower, maybe to 10 or 15. I think I didn't want to exclude rated games between players with a large rating gap where the lower rated player wins against the higher rated player, but those situations are probably extremely rare.

aramcb commented 1 year ago

. The rating gain [RatingDiff] should get smaller as more games are played.

Can you elaborate why this is the case? I'm looking at some of my first game on Lichess and this seems accurate: i.e., a 1500? vs a 1567 results in a much larger RatingDiff (gain/loss) for the provisional (?) player than the stable player.
If this is true (that RatingDiff are larger for players with provisional players), then this is a good method to prevent players who are accurately rated at 1500 (the starting) from gaining/losing too much Elo simply because their accurate rating is the default and they are winning/losing against players with provisional ratings.

Also: According to Lichess FAQ:

[Provisional]...it means that the Glicko-2 deviation is greater than 11

PS: thanks for this very cool analysis! I'm awed by the work you put into assembling the relational database {I'm limiting my own analysis to one year as the setup overhead looks very daunting!}

jcw024 commented 1 year ago

I think there's a pretty good explanation for how the rating diff changes over time/games under another Lichess FAQ here.

I'm glad you enjoyed the analysis, it's been over a year since I worked on this, so happy people are still finding it useful/interesting. I think the main hurdle is the setup, but once it's setup it's the same whether you download 1 year or multiple years. It just depends on how much storage you have on your machine and how long you want to wait for it to download/process everything. More recent years have a lot more data than earlier years and will take more memory/time to process.