lightvector / arimaa-server

Arimaa server
Other
9 stars 2 forks source link

Ratings system improvements #114

Open lightvector opened 8 years ago

lightvector commented 8 years ago

Implement a better ratings system. Something WHR-based would be nice, including for retroactively taking into account games unrated after being played. This would probably involve some work thinking about how to structure the back-end update process. And some thought on how to deal with unusual user behaviors, how bots should differ, globals ratings anchors and adjustments, etc.

mattj256 commented 8 years ago

It sounds like we need metric(s) of what constitutes a good rating system. For the record, there was a user on Arimaa.com who recently single-handedly changed the ratings of a number of bots by more than a hundred points each by losing repeatedly to a weak bot, then winning repeatedly against stronger bots. One possible metric is the ability of a single user to have a disproportionate impact on another user's rating. One metric is that the rating system should be not too CPU-intensive to compute. One metric is that the player's ratings should tell you something about the probability that one of them will win against the other. One metric is that the ratings should be stable over time: the strength of a 1400 player today should be comparable to the strength of a 1400 player last week or last year. Also if a player wins repeatedly against a very weak opponent, should the winner's rating rise arbitarily high or stabilize at some value?

There's also a policy element here: the Free Internet Chess Server has explicit policies prohibiting certain types of cheating and rating manipulation. (For example, rules 12, 13, and 15.) http://www.freechess.org/Help/HelpFiles/abuse.html

For what it's worth I would say why not just implement plain old regular WHR? If you want to allow for retroactively unrating a game, I don't know if there's a better solution than periodically recomputing every single player's rating.

clyring commented 8 years ago

FWIW, also look at the line notes at https://github.com/lightvector/arimaa-server/commit/ded9d1aa743a2d594e2e60affd44b37227964d66 for some more thoughts on the computational end of this. Of your metrics:

As I have a lot of personal experience working with ratings-related systems, this is probably something I will spend a lot of time tinkering with down the road.

Another consideration to keep in mind in design of the rest of the site to facilitate later rating system improvement: "Have at least some information on the new players entering the system." In particular, we will probably want to handle each of the following cases differently:

mattj256 commented 8 years ago

I'm out of my league here. I understand Glicko and WHR conceptually, but not well enough to implement them myself. And I definitely don't understand the math.

When I used to play games on Yahoo Games, I remember your rating was marked as "provisional" until you had played a certain number of games. If I remember correctly they didn't publicly display the ratings for provisional players. (This could avoid the problem clyring mentioned where a new player with high uncertainly splits his first two games against the same opponent and his rating jumps by a large amount.)

If a new player is strong, it could be useful to allow them to start with a high ranking provided they meet some criteria like defeating a few high-rated bots or solving a few tactics puzzles. This is similar to colleges that have a foreign language requirement and allow you to place out of the requirement by demonstrating proficiency.

I could imagine setting things up so that for the first X games a new player's rating is updated normally, but the opponent's rating change is calculated retroactively after the system has a better estimate of the new player's true strength. (I don't know if that's a good idea or not.)