sublee / trueskill

An implementation of the TrueSkill rating system for Python
https://trueskill.org/
Other
748 stars 114 forks source link

Large FFA produces unexpected mu values #23

Open jhansen461 opened 6 years ago

jhansen461 commented 6 years ago

Possibly related to #22

I ran multiple large FFAs. Some of the FFAs consist of large parts of the population while others are much smaller. I noticed that one player who only did a few of the smaller FFAs and performed relatively poorly had the largest mu of all the players while still maintaining a relatively small sigma. Does this appear to be an issue with my setup, this implementation of trueskill, or an issue with trueskill itself?

Here is my setup: draw_probability = 0, mu = 25, sigma = mu / 3, beta = sigma / 4

I have bolded the matches where both players competed. Matches are listed in chronological order.

Player 1 (identifier externally as the best player): trueskill.Rating(mu=51.219, sigma=3.449) 1 / 979 trueskill.Rating(mu=40.846, sigma=1.768) 13 / 890 trueskill.Rating(mu=38.448, sigma=1.334) 18 / 727 trueskill.Rating(mu=38.392, sigma=1.132) 3 / 800 trueskill.Rating(mu=38.980, sigma=1.049) 1 / 711 trueskill.Rating(mu=39.408, sigma=0.988) 1 / 578 trueskill.Rating(mu=39.387, sigma=0.911) 2 / 503 trueskill.Rating(mu=39.664, sigma=0.874) 1 / 355 trueskill.Rating(mu=39.789, sigma=0.851) 1 / 687 trueskill.Rating(mu=39.919, sigma=0.852) 2 / 139 trueskill.Rating(mu=39.947, sigma=0.851) 18 / 132 trueskill.Rating(mu=39.382, sigma=0.848) 8 / 128 trueskill.Rating(mu=39.404, sigma=0.851) 2 / 129 trueskill.Rating(mu=40.144, sigma=0.851) 1 / 116 trueskill.Rating(mu=39.502, sigma=0.847) 8 / 115 trueskill.Rating(mu=39.386, sigma=0.849) 1 / 80 trueskill.Rating(mu=39.386, sigma=0.853) 1 / 122 trueskill.Rating(mu=38.502, sigma=0.789) 34 / 1817 trueskill.Rating(mu=37.862, sigma=0.739) 16 / 1629 trueskill.Rating(mu=37.462, sigma=0.698) 8 / 1354 trueskill.Rating(mu=37.562, sigma=0.686) 1 / 1418 trueskill.Rating(mu=37.714, sigma=0.672) 1 / 1304 trueskill.Rating(mu=37.354, sigma=0.642) 10 / 1081 trueskill.Rating(mu=37.001, sigma=0.617) 17 / 975 trueskill.Rating(mu=36.832, sigma=0.596) 4 / 919 trueskill.Rating(mu=36.538, sigma=0.577) 11 / 1237 trueskill.Rating(mu=38.168, sigma=0.579) 9 / 202 trueskill.Rating(mu=37.909, sigma=0.579) 112 / 194 trueskill.Rating(mu=38.314, sigma=0.580) 22 / 182 trueskill.Rating(mu=39.261, sigma=0.580) 10 / 177 trueskill.Rating(mu=38.636, sigma=0.579) 37 / 171 trueskill.Rating(mu=39.591, sigma=0.580) 16 / 166 trueskill.Rating(mu=39.939, sigma=0.582) 2 / 168 trueskill.Rating(mu=39.716, sigma=0.581) 37 / 186

Player 2 (the best player according to trueskill): trueskill.Rating(mu=41.308, sigma=2.696) 134 / 139 trueskill.Rating(mu=76.557, sigma=1.677) 69 / 132 trueskill.Rating(mu=69.771, sigma=1.357) 115 / 128 trueskill.Rating(mu=72.300, sigma=1.146) 83 / 129 trueskill.Rating(mu=75.554, sigma=1.035) 95 / 116 trueskill.Rating(mu=78.606, sigma=0.942) 87 / 115 trueskill.Rating(mu=87.675, sigma=0.878) 5 / 80 trueskill.Rating(mu=88.466, sigma=0.814) 72 / 122

sublee commented 6 years ago

I don't understand your issue. Please explain again with short sentences to let me know:

  1. FFAs meaning
  2. what's the expected result
  3. which result is weird
jhansen461 commented 6 years ago
  1. FFA = Free for all. Basically a leaderboard rather than a single winner/loser. The ranking is what is at the end of each line.
  2. I am expecting player 1's trueskill (37.973) to be greater than player 2's trueskill (86.024)
  3. Player 2's mu goes up very quickly (with sigma constantly decreasing) despite only doing very well in the 2nd to last result. Getting the 5 / 80 even decreased sigma even though player 2 is always in the bottom half in every other game. Meanwhile, player 1 is often getting 1st place in multiple games, yet has a trueskill that is substantially lower than player 2.
sublee commented 6 years ago

Can I get the full match result set to understand each number? I guess Player 2 has not finished enough games.

bernd-wechner commented 6 years ago

Without the full match history there's not comment to make here. sublee is being very patient with you jhansen461. The bottom line is if trueskill thinks player 2 is the best there are generally three possible reasons for this:

1) Player 2 has beaten Player 1 a lot (this effectively rewards Player 2 over Player 1 with mu growth) 2) Player 2 has beaten more people than Player 1 (this also effectively rewards Player 2 with more mu growth than Player 1 generally) 3) Player 2 has played more games than player 1 (over time sigma shrinks and so the ranking which is mu-3*sigma) goes up simply by virtue of recording results any results).

It's not clear from your info at all in which games player 1 and 2 are playing together and which ones they aren't, if the lists are exhaustive (i.e the whole history of Player 1 and Player 2) and what your external identifier is.

The only way to be sure of what's going on is to step through all the match results and appraise. But you can do that too. Find one that puzzles you look at the rankings and the updates and ask questions about that.