Very large free-for-alls produce negative, positive ratings outside the expected range

jaguilar commented 6 years ago

I made a free-for-all consisting of 20k default-initialized players. The top-ranking players in a simulated game had ratings of over six hundred with the default trueskill settings. The bottom ranking players had negative ratings. I had been under the impression that the default settings would generate ratings between zero and fifty. Is this a bug in the Python version of the code, or the algorithm itself?

Code, if you don't have access to colaboratory or lack a Google account:

# -*- coding: utf-8 -*-
"""TrueSkill Surprising Ratings

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/notebook#fileId=1OctL8znwKZUvthK5_rv7KKfpUK4oKmYv

Start by installing dependencies.
"""

!pip install trueskill

import trueskill as ts

ts.setup(backend='mpmath')

# Give us more precision.
import mpmath
mpmath.mp.dps = 25

"""Generate some test data."""

nplayers = 20000

players = [ts.Rating() for x in range(nplayers)]
teams = [(players[i],) for i in range(nplayers)]

import random
ranks = random.shuffle(list(range(nplayers)))

"""Do an update. This is an n-player-way free for all. We would expect the ratings to remain within 0-50, but we see that the top ratings end up being way over 50. This may explain the very low rankings for the lowest elements."""

new_ratings = ts.rate(teams, ranks=ranks)

sorted(new_ratings, key=lambda x: x[0].mu)[:10]

"""[(trueskill.Rating(mu=-5898.653, sigma=3.727),),
 (trueskill.Rating(mu=-5898.052, sigma=3.727),),
 (trueskill.Rating(mu=-5897.454, sigma=3.727),),
 (trueskill.Rating(mu=-5896.859, sigma=3.727),),
 (trueskill.Rating(mu=-5896.264, sigma=3.727),),
 (trueskill.Rating(mu=-5895.670, sigma=3.727),),
 (trueskill.Rating(mu=-5895.076, sigma=3.727),),
 (trueskill.Rating(mu=-5894.482, sigma=3.727),),
 (trueskill.Rating(mu=-5893.889, sigma=3.727),),
 (trueskill.Rating(mu=-5893.295, sigma=3.727),)]"""

sorted(new_ratings, key=lambda x: x[0].mu)[-10:]

"""[(trueskill.Rating(mu=5943.295, sigma=3.727),),
 (trueskill.Rating(mu=5943.889, sigma=3.727),),
 (trueskill.Rating(mu=5944.482, sigma=3.727),),
 (trueskill.Rating(mu=5945.076, sigma=3.727),),
 (trueskill.Rating(mu=5945.670, sigma=3.727),),
 (trueskill.Rating(mu=5946.264, sigma=3.727),),
 (trueskill.Rating(mu=5946.859, sigma=3.727),),
 (trueskill.Rating(mu=5947.454, sigma=3.727),),
 (trueskill.Rating(mu=5948.052, sigma=3.727),),
 (trueskill.Rating(mu=5948.653, sigma=3.727),)]"""

sublee commented 6 years ago

I don't think TrueSkill guarantees the 0-50 range. I haven't heard about TrueSkill's limit range. A mu=-1 player is just weaker than a mu=0 player.

In my opinion, it's not a weird behavior in a very large game. You can get ratings out of 0-50 among 2 players with very many games:

>>> a, b = Rating(), Rating()
>>> for x in range(100000):
...     a, b = rate_1vs1(a, b)
>>> a.mu
65.42929086439979
>>> b.mu
-15.429290864397833

jaguilar commented 6 years ago

Hrm. I think I misunderstood your comment about ratings being too low in #5 to mean that they should not be below zero. The true cause of my zero division errors is that some time during the calculation, the highly rated players are thought to have way too high of a probability of winning against an unranked player. So maybe I need to change my parameters so that mu updates more slowly.

sublee commented 6 years ago

@jaguilar Okay. Can we close this issue?

jaguilar commented 6 years ago

Yes, thanks.

On Feb 11, 2018 3:01 AM, "Heungsub Lee" notifications@github.com wrote:

@jaguilar https://github.com/jaguilar Okay. Can we close this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sublee/trueskill/issues/22#issuecomment-364731487, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwzTGKT06jm46w2WjPqAHJ4cuVzEk3Dks5tTp5BgaJpZM4R4z-e .

sublee / trueskill

Very large free-for-alls produce negative, positive ratings outside the expected range #22