sublee / trueskill

An implementation of the TrueSkill rating system for Python
https://trueskill.org/
Other
742 stars 112 forks source link

Ranks for games with scores like tennis/padel #54

Closed dandaka closed 1 year ago

dandaka commented 1 year ago

Hey! I am currently using Truescore to rank padel players playing 2:2 games. After each game, you get a score like "3:2" or something. Should I use those scores as ranks? Or enter [1;0] for the winner? Can't find any documentation/examples for ranks, so any recommendations would be valuable!

bernd-wechner commented 1 year ago

You can use either, TrueSkill only uses the ranks to order the players from best to worst in terms of performance in that game.

That said, I doubt TrueSkill is a good model for you to use in this scenario. Alas, for 2 on 2, you will find one of TrueSkill's weaknesses very quickly. That is, the skill estimate (and subsequently rating) of the two winning players will not be equally improved by a victory, nor the skill estimate (and subsequently rating) of the losers be equally reduced. This is simply an unfortunate feature of the TrueSkill definition, IMHO.

The difference does get lost over time as skills diverge, but only if the two players in a team play in other teams as well. If from the point of allocating default initial estimates they are always playing on the same team together, it will remain the case that they (perplexingly) have different ratings.

dandaka commented 1 year ago

Hey, first of all, thank you for this deep and quick reply! Much appreciated.

If I understand you correctly, a change in rating after a game depends on the player's initial rating. So if players have a different rating, their change is different.

We have a game of padel, where people mostly play within different teams (or at least change partners frequently). So I personally don't see a problem if each rating changes differently for each player.

Do you have any other rating models, that suit this type of setting better? I have briefly evaluated Elo and Trueskill looks like a way better model when we have:

→ 2 vs 2 → partners change often → random game amount

two player sin a team play

What does "sin" means here?

bernd-wechner commented 1 year ago

Yes, the new ratings TrueSkill (or ELO) produce are a function of:

  1. The initial ratings of all players
  2. The ranking of the players (or teams).
  3. In TrueSkill's case, the value of tuning parameters like Beta (modelling the role of luck in the game) and Tau (a very coarse and inadequate model for loss of confidence in skill between games).

The issue I described in only really visible to purists and people at exactly the same skill (which is the default at the outset) playing as a team, emerging (inexplicably) with slightly different ratings after the game. The difference will be small but visible, and get lost in the noise if they partners change often (visible only as I said at the outset when skills are identical).

You could work around this by not issuing ratings until the interactions were mature (in fact, all you need is each player to have played in a team with two or more others, not just one). After which point, this glitch in TrueSkill will no longer be visible.

What does "sin" means here?

It means a typo got through my LanguageTool supported efforts to write text cleanly in a hurry.