Closed amosborne closed 3 years ago
d659b8aa2adb6b3524a88c3199c8c1896cd01870 adds a new poetry run hikuwr --extract
option to create two new databases for player rating experimentation/testing.
The datasets are entirely distinct. They are both from the surge of new players/matches when puyolobby.com was released. The dataset is split into two approximately equal sized portions, divided along community lines. Each dataset covers a two month timespan with about 1000 matches total across 250 players.
Note that the community algorithm used to split the datasets is non-deterministic and therefore the resulting databases will change each time the command is invoked.
In 9252d5c69f5f00f62503ecfcc33c55115c9532cb I propose the first metric to validate any proposed rating algorithm: ranking order correlation. With Hiku's World Ranking as the benchmark, the two extracted puyolobby datasets are run through the Donguri Gaeru rating algorithm, the ranking order of the players are compared, and the Pearson correlation coefficient is computed. A successful rating algorithm will yield a correlation close to 1.
At present, the Donguri Gaeru rating algorithm is entirely random and the resulting correlation is near zero. A plot is also provided to visualize the correlation.
I'm not sure of what use a random algorithm will have for testing. I was under the impression that we were using Glicko-2 as its a popular, easy, and well tested option; from the looks of this, it seems like another approach is being taken—as in, we are developing our own system? I may have misunderstood the intentions, so it'd be great if you could clarify.
The random algorithm was literally just so I could generate a plot and test the code wrote so far.
Thanks, I was unsure. If it's alright, I can go ahead and write a Glicko-2 implementation to use.
I did some research into ways we could represent a ranking for a leaderboard, and I found GLIXARE.
GLIXARE is a formula used by TETR.IO and some other games with rating systems to approximate player skill as a single number, which is used to get a percentage of how likely a player is to win a match against an opponent. It's outlined here: https://www.smogon.com/forums/threads/gxe-glixare-a-much-better-way-of-estimating-a-players-overall-rating-than-shoddys-cre.51169/
The formula in question (in Python syntax):
round(10000 / (1 + 10**(((1500 - rating) * pi / sqrt(3 * log(10)**2 * rd**2 + 2500 * (64 * pi**2 + 147 * log(10)**2)))))) / 100
If the cap for getting a rank through this formula is less than 100 RD, then it would be pretty suitable for figuring out a more concrete number for rankings. Try putting this with Hiku's World Ranking to see if it works better.
Thoughts?
This is a neat idea. I’m not sure how many people in my test data actually get below 100 RD, but I will try this and see.
I propose to close this issue upon merge of the linked pull request. Hiku's World Ranking rating algorithm has been implemented and validated according to the following:
Moving forward, the most important decision to be made is how to communicate player ratings/rankings on the website in order to best communicate a player's progression. For the initial release it may be best to simply include on the website both ratings and rankings, and also include a short write-up of how to interpret those numbers.
Develop some methodology for quantifying the effectiveness of the ranking system algorithm (in the MLP specification, currently proposed as Glicko2, potentially with some TBD modifications).
With Hiku's World Ranking as a benchmark, run the initial proposed algorithm against the test database and compare results to Hiku's rating system. I propose this to be done within a Jupyter Notebook with some plots and metrics to visualize rating distribution and how well the relative order and uncertainty correlates between the two systems.
Ultimately our own rating system goals will need to be defined with corresponding metrics to quantify algorithm effectiveness, but creating a simple visualization to compare systems and get a feel for the data will be a great first step.