sublee / trueskill

An implementation of the TrueSkill rating system for Python
https://trueskill.org/
Other
742 stars 112 forks source link

Documentation on free-for-all #60

Closed TypicallyThomas closed 1 month ago

TypicallyThomas commented 5 months ago

Hi there. Great library!

I was reading the documentation and I feel like there's very little information on free-for-all games, especially when it comes to non-zero sum games. I'm unsure how to rate players in a non-zero sum game like racing, for example. How do you take distance from winning into account, for example? Do I just input a list of the result, normalized to between 0-1? In any case, I feel this could be clearer in the docs. Thanks!

lifebound commented 1 month ago

Would you not just use the ranking table feature?

For example, in a race between Alice, Bob, Carol with the following race results:

  1. Carol
  2. Alice
  3. Bob

you should be able to do something like

ratings_group = {p1: alice.rating, p2: bob.rating, p3: carol.rating}
rated_rating_groups = env.rate(rating_groups, ranks=[1,2,0])
for player in [p1,p2,p3]:
    player.rating = rated_rating_groups[player.team][player]
TypicallyThomas commented 1 month ago

This is how I've been trying it, but the results don't work well. Day for example you're grading performance within a team of racers. Say Alice is P1, Bob is P5, Charlie is P13. Charlie is the worst within that team, but if there's 20 racers, Charlie could have been as bad as P20. I'm basically looking for something that takes this distance into account, so that Charlie in P13 scores better than he would in P20, even if both times he's the worst in the team

bernd-wechner commented 1 month ago

How do you take distance from winning into account, for example?

TrueSkill takes no account of the distance from winning at all. Of course it depends on what you mean by that as well. The only distance it takes into account is in rankings, and rankings alone, it takes no account of scores for example, and importantly when players are grouped into teams that whole team shares one ranking.

Say Alice is P1, Bob is P5, Charlie is P13. Charlie is the worst within that team, but if there's 20 racers, Charlie could have been as bad as P20.

It's simply not clear what you're asking here. The only way sensibly to model this 20 racer race with TrueSkill is to enter the ranks of all 20 racers. And then TrueSkill will adjust its assessment of skill based on the rankings, and it will know how P13 and P20 ranked. What are you asking?

I'm basically looking for something that takes this distance into account, so that Charlie in P13 scores better than he would in P20, even if both times he's the worst in the team

Just confusing me here I admit. What does P13 and P20 mean? Is that place, or ranking. So if Charlie comes in place 13 you want him to score better than if he comes in place 20?

If I have understood right (and I'm by no means sure I have understood you right)then two things jump out at me:

  1. TrueSkill does not assign scores so I'm not sure what you're looking for here. TrueSkill makes skill estimates based on performance measures. That is all. And the current skill assessment comes as a pair of numbers which can be combined into one number known as a rating. Which is most definitely not a score. Your rating as a player is the product of your entire recorded history of performance! It's not a product of this one race!

  2. If you enter all 20 racers as a ranked list then when Charlie comes in at 13th place he will have a better rating update than he does if he comes in last at 20th place. That is the very point. There's a new piece of evidence, Charlie is a 13th place performer or he's a 20th place performer and TrueSkill uses that to update its 'latest' estimate of his skill. There are two things notice here: a) probably his rating will go down in both cases, I mean he's running below the middle point and generally a pretty bad racer on basis of a 13th or 20th place, but it's just one result and so TrueSkill would tend to rate him down a bit (the amount is determined by a configuration called Beta which is the skill sensitivity, meaning how much we trust one single result vs history of results) BUT, there's another thing at play which is mildly complicated to explain (and I'm happy to if you want, or you can take it on trust for now) that if this is his first, second, third, fourth, fifth or such race ... (i.e. he's a relatively dark horse and we don't have much information about Charlie) then he starts with a default rating (configurable), which gets better with every new piece of evidence. Even if we he's losing a lot in the first few races his rating may go up, because TrueSkill essentially applies a confidence bonus, or more accurately an uncertainty penalty (your rating is at the 95% confidence point typically say).

TypicallyThomas commented 1 month ago

Yeah to be honest I am not sure TrueSkill is the correct rating system for my purposes. I've found Glicko-2 to give me more consistent and useful results in this regard, but it is the kind of usecase that may not be fully solved with either.

bernd-wechner commented 1 month ago

Glicko is a chess rating system, for two players. I'm not sure how you think that's going to work for 20 participant races. But good luck with it (if you feel it handles multiplayer events let me know, but TrueSkill is essentially an extension of the ELO chess rating system to n-player contexts). I'm not aware of any effort to do same to Glicko.