hbx-luv / codetango

The best way to play Codenames online
https://play.hbx.vision/
GNU General Public License v3.0
2 stars 3 forks source link

Incorrect ELO Calculation #34

Closed jjzabkar closed 4 years ago

jjzabkar commented 4 years ago

Summary

I believe the current ELO algorithm application is incorrect. A player’s ELO should only increase with a win. Case in point: this user has 3 wins out of 14 games, but I can see ELO upticks for more than 3 games.

This possibly indicative of:

Repro

Screen Shot 2020-07-14 at 8 39 38 AM
Matttaylor8910 commented 4 years ago
Explain how a player’s ELO can go up if they lose?

Matt Taylor  9:46 AM
they performed better than anticipated
9:47
imagine the best foosball player playing the worst
9:47
they're expected to win 10 - 0
9:47
if the loser gets 9 points that's crazy (edited) 
9:47
the loser is awarded

JJ Zabkar  9:47 AM
Then you have a problem with order of operations
9:47
ELO for team should use a pool

Matt Taylor  9:47 AM
elo for a team is a dumb average of the player's elos (for now)

JJ Zabkar  9:48 AM
But a losing player should maintain *at best* with a loss, but never go up

Matt Taylor  9:48 AM
definitely open to ideas for a more representative elo rating for a team
9:48
I disagree with that, JJ

JJ Zabkar  9:48 AM
 dumb average
Ah, I see your characterization then. I disagree! It can be smarter, if you use a pool strategy

Matt Taylor  9:49 AM
over the 4 years we used this algorithm for foosball, we saw interesting results for rewarding the losers that performed better

JJ Zabkar  9:49 AM
I’ll see if I can decipher your ELO code.
WRT seed: Do you seed initial players at the standard 1200?

Matt Taylor  9:49 AM
they had more motivation to play the top players
9:49
and the top players had to really try to not perform worse than predicted
9:49
it was a net positive (edited) 
9:49
yes, everyone starts at 1200

JJ Zabkar  9:50 AM
losers that *performed* better
Does this mean that you’re taking the final score into account in your ELO calculation?

Matt Taylor  9:50 AM
yes

JJ Zabkar  9:50 AM
I know that 538 does that for some sports

Matt Taylor  9:50 AM
it's not just 1 - 0

JJ Zabkar  9:50 AM
ah, ok
9:50
that makes more sense then

Matt Taylor  9:50 AM
the max score is 10 points (you made contact with all your agents)

JJ Zabkar  9:51 AM
Interesting decision though. Really small sample size for that. Is a 0-1 victory really three times (linearly?) as valuable as a 0-3 victory?

Matt Taylor  9:51 AM
we had a discussion about scoring starting here:
https://tangocard.slack.com/archives/C0152DQ4WV7/p1594303219000400

Alex Bacon
You can convince me otherwise, but a win is a win and a loss is a loss
Posted in #topic-guild-codetango | Jul 9th | View message
:eyes:
1

9:51
tbh I don't remember if it's linear
9:52
we adjusted it a lot over 4 years and found something that worked for our case (foosball) (edited) 
9:52
definitely open for changes for this game
9:52
they're very different, I just came up with a base implementation to get some stats rolling (edited) 

JJ Zabkar  9:53 AM
I just know that when I did team ELO (for 5 years, ha!), there was a lot more “bounce” in ELO scores. The scores I see here are really uncharacteristially steady to me. Maybe it’s your d-factor; with Bob winning 10+ in a row, I would expect his ELO to be well over 1400 by now.

Matt Taylor  9:56 AM
sweet! I've never worked with someone who also rolled their own. Feel free to change it/play around with it, I can invite you as a collaborator on GH and invite you to the console so you can peep data
9:56
really easy to recalc the elo rating from the beginning of time if we wanna make changes

JJ Zabkar  9:56 AM
No worries; I can clone/fork.

Matt Taylor  9:56 AM
:the_horns::skin-tone-3:

JJ Zabkar  9:57 AM
Really appreciate your receptiveness!

Matt Taylor  9:57 AM
of course dude! would love to learn a better way!
9:58
either way, it's decently accurate right now I'd say

JJ Zabkar  9:58 AM
for sure

Matt Taylor  9:58 AM
I expected to see Bob + Bacon at the top

copy/pasta discussion from slack for context

Matttaylor8910 commented 4 years ago

Resolved by implementing pure elo with a k factor of 32: https://github.com/hbx-luv/codetango/commit/bdd8c3eb207cb99eca2bfbec61c09aec8e98157b