lwerdna / bugtrack

bughouse rating and tracking software
GNU General Public License v3.0
4 stars 2 forks source link

Possible for Rd=350 players to gain/lose 700-800 points #7

Closed scottbb closed 12 years ago

scottbb commented 12 years ago

Create a matchup for TeamA = {AndrewC 1129.350, TaylorG 1124.172}, TeamB = {Chris 1265.161, PeterL 1075.161}. It says AndrewC stands to gain 802 points, or lose 726 based on the outcome of this match. That's completely ludicrous swing.

Replace AndrewC with anybody with a Rd=350 (TimB, LarryW, basically anybody who hasn't played with the new system yet).

Your scoring sucks! ;-)

[[[[Scott: Edited your post to remove last names -andrew]]]

lwerdna commented 12 years ago

The true, difficult issue here is answering "DOES THE SCORING SUCK?". Maybe it's glicko intended behavior, or maybe it's a miscalculation.

In the context of the glicko system, RD=350 means the player is playing his first game. Glicko knows nothing of this player, so it's making desperate leaps to place him. It should change also a lot depending on who he's playing against.

If he's playing against people with low RD, the system is confident that the opponents' scores are accurate, and he deserved the HUGE jump if he wins.

And remember, even if he takes huge jump or huge loss, his RD is still going to be high on next game, so you expect then another huge (but not as huge) jump.... like the size of the jumps will get smaller and smaller as his games increase and rd shrinks and he hopefully settles at his true rating.

But again, solving this issue is coming up with supported answer to whether we've made a coding error or this is how it's supposed to work.

thompGIT commented 12 years ago

Does this issue change if the beginning scores for all players begin at 1500 similar to traditional ELO ratings?

scottbb commented 12 years ago

Andrew, I have to disagree just based on the "sanity" of the outcome. If AndrewC won, he would be the highest ranked person we have EVER had in the scoring, at 1931. I don't think we've had anybody above 1800. If he lost, which is likely, he would be at 403, even lower than AndrewW, in just a single game.

I think some calibration constants are off. That wide of a swing just doesn't make sense.

Thomp, I don't know. I just observed this behavior on the live site when I was showing the tracker to Rusty. I will try to repro with a local copy with different data.

lwerdna commented 12 years ago

It's insane because we have an intuition of what these players should be at - the system has no such knowledge. The system assumes that the player will play many games (Glicko says 5-10 per rating period) and with this frequency, any initial huge jumps will level out.

It's like in starcraft when you win your placement matches by luck at the beginning of a new season and you place higher than expected. No way around that: it will level out over time.

At worst, a player with huge RD, and hardly any games should not have any rating of his taken seriously.

Pull the latest code, wget the players.dat, and re-run the tests - (a bug was fixed and period adjusted). Let me know what you think.

lwerdna commented 12 years ago

fixed in e1375fb

(by fixed I mean that the ratings adjustments seem more sane and I cannot reproduce the scenario where someone could lose more rating than they have, as was the case with the Lee example)

lwerdna commented 12 years ago

didn't mean to close this one, only the issue explicitly tagged as "bug"... we can continue discussion here

thompGIT commented 12 years ago

if it helps any, check my score history on FICS. I luckily won my first match and was 1700 or so immediately. that score sank like the Titanic after I played a few games.

lwerdna commented 12 years ago

oh good example! bughouse-db.org actually has you at (0) for your first 20 games, and then (1597) at game 2253610....

so that's another way we could solve it: just start the ratings in an invalid state and wait for a certain threshold of games are played until the rating becomes valid

to keep the code simple, we could just put this on the data-mining side: like the leader board could have a toggle button (on by default) to only show players with 50 games or so

thompGIT commented 12 years ago

don't get too excited. i'm pretty sure most of the first batch of games I played were unranked.

lwerdna commented 12 years ago

I think I've read that bughouse-db only logs rated games ...excitement, remaining

lwerdna commented 12 years ago

I've changed my mind and think I was wrong on the issue of whether all players should start with the same score.

I argued against it, claiming that the new players would be immediate targets. That it would be like a gold rush to get their points.

Well the glicko system really shines in this scenario. A well established player (low RD) does not stand to gain much at all from a new (and possibly overrated player) (high RD).

Choose any fair matchup of established (low RD) players, then substitute in someone brand new like Lee or Joe. You'll see that his points are not able to be "stolen".

The scores and recalc you see are done with all players given initially a 1000 rating, 350 RD (even ones we know are less). This is the way it should remain, I think. Thoughts?