sublee / trueskill

An implementation of the TrueSkill rating system for Python
https://trueskill.org/
Other
748 stars 114 forks source link

What's quality_1vs1 really shows? #62

Open Ghotrix opened 2 weeks ago

Ghotrix commented 2 weeks ago

I've simulated a match of 10000 games between players A and B. A is stronger than B and is expected to score 0.71 on average. Given that the expected score and win probability - P(W)=0.5, P(D) would be equal to 0.42 and P(L)=0.08. Yet the simulation result shows quality_1vs1 is equal to ~0.882. Am I misunderstanding quality function or it's something else?

bernd-wechner commented 2 weeks ago

I won't speak for that function now but quickly:

  1. The source code is easy to browse: https://github.com/sublee/trueskill/blob/master/trueskill/__init__.py#L516

  2. Match quality in Trueskill is a measure of the probability of a draw given the skills of the two players (within what they call the draw margin - a configuration for the game). It is intended to describe the quality of the match in the sense that it's much more fun if players are equally skilled and much less fun (a lower quality match) if one player is way better than the other and you can predict up front they'll win. Match quality is a function only of the current ratings of the competing players and has nothing to with their history (or the 1000 matches). What you presumably want to look at after a 1000 match test, is the ratings of the two players and assess the probability of victory for the two players. I don't think this library implemented win_probability but I did, and if you're keen I can take a closer look later.