odota / core

Open source Dota 2 data platform
https://www.opendota.com
MIT License
1.51k stars 303 forks source link

player rankings on heroes #729

Closed howardchung closed 8 years ago

howardchung commented 8 years ago

We still don't have every match, but we can build the infrastructure and it should extend to adding more matches later.

Design: We now have MMR data for 660k players.

Metric: MMR * wins on hero

Eligibility criteria would be:

We just run a periodic query (once a day?) something like:

select hero_id, account_id, count, rating, count*rating as score from 
(select hero_id, count(*)
from player_matches join matches where account_id = ? and (player_slot < 64) = radiant_win group by hero_id) wins

JOIN
(SELECT account_id, rating
FROM player_ratings pr
WHERE time = (SELECT MAX(time) from player_ratings WHERE account_id = ?);
AND account_id = ?) r
ON account_id = r.account_id;

JOIN
players on account_id = players.account_id;

We can store the results in a table indexed by hero_id (for ranking within a hero) and account_id (for ranking of heroes for a player)

Then we can probably pull a toplist for each hero, percentile for a player on a hero, and percentile for a player for all heroes.

howardchung commented 8 years ago

of course we still have the problem of not adding all matches, so perhaps this would need to wait until we do that

howardchung commented 8 years ago

this really won't be accurate until/if we add all matches. I am going to close it, perhaps we can revisit if we ever do the complete history import.

howardchung commented 8 years ago

updated with implementation plan in OP. comment if feedback

Uesugi1 commented 8 years ago

Im not experienced in code and such but Id like to give some input.

I was looking at the dotabuff hero rankings and what bugged me the most was that players who played in professional games get some sort of premium points boost beyond everything. They have about 10x less games played but yet are above everyone else, which I think shouldnt be like that.

Also from what I can see is that you guys are actually taking ranked games only into account which I think is good, having unranked games count towards this sort of ranking would be really bad.

howardchung commented 8 years ago

Actually the only metrics are MMR and number of games won on hero. We could make it number of ranked games won on hero, although this would reduce the amount of data we have to draw on.

Uesugi1 commented 8 years ago

Well ranked games is what counts basically, if somebody wants to get on the leaderboards then they are at least competitive so they play ranked. If they try to do it from unranked then its usually some sort of abuse and not really a sign of top players on that hero. You could be played unranked for a really long time since you start an account and only play one hero, youd have ridicilouse stats and winrates on it making it look like youre the best at that hero when you arent, rather than actually having the data pulled from only ranked games where it actually matters.

howardchung commented 8 years ago

Ranked games only is a possibility, although it means an extra join (player_matches with matches).

I think we are going to have to limit to the last 100,000 matches or something. There's just way too much data otherwise (I estimate ~10 billion player_match rows by the time the complete history import is done)

howardchung commented 8 years ago

A question to be asked: Is this methodology flawed? Is someone who has won 100 games on a hero at 2500MMR as good as someone who has won 50 at 5000MMR?

I think games won is a linear indicator of skill whereas MMR is exponential (it becomes much harder to increase in MMR near the top)

I don't know if tuning is needed to accurately compute rankings.

Uesugi1 commented 8 years ago

To answer your first question: the 5000 MMR guy will play better with just 2-4 wins. How I would do it: higher mmr = more points on the rankings rather than how many games you played. For example 2000 mmr guy gets 10 points for winning, 5000 mmr guy gets 30 points, 6000-7000 like 35-45 points.

So if somebody thinks theyre the top at that hero they will play it, and to be the top at something you need to climb mmr, climbing mmr gives more points, more points = higher ranking.

Only problem is that for example 2000 mmr guy plays 100 games which = 1000 points, 5000 mmr guy plays 33 games and he will have as much as the 2000 mmr guy. Obviously the number of points need to be adjusted since the mmr difference between 2000 and 4000 is huge, then comes 4000 to 5000 and 5000 to 6000-7000 is something average of a difference. But winning games gets you points, losing loses you the points just like MMR. Basically hero MMR....

howardchung commented 8 years ago

I'm trying to include three factors:

Games played (players with more games on a hero tend to be better) win rate (better players win more) MMR (better players have higher MMR)

The first two can be combined into one number by taking the number of wins (matches * winrate).

The third is the one we have to figure out how it scales.

I'm thinking the "score per win" should be exponential because MMR is a more important and accurate indicator of skill.

Perhaps something like base^MMR. Examples with base 1.001: 1000 MMR = 1.001^1000 = 2.716 2000 MMR = 7.382 3000 MMR = 54.489 8000 MMR = 2969.065

I don't think an Elo style system with points being added/subtracted will work because Elo requires that we know the ratings of all of a player's opponents/allies, which we don't have because Valve allows players to appear as anonymous.

I am not a math/stats guy so perhaps someone with more expertise could weigh in.

howardchung commented 8 years ago

Wolfram Alpha makes it easy to mess with the MMR curve if you want to play with different values: https://www.wolframalpha.com/input/?i=1.0005%5En+from+1+to+8000

Uesugi1 commented 8 years ago

Unfortunetly Im also not a math guy so I cant help you, but I can help you with MMR difference (I play at 5-6k). The difference in 1-2k is irrelevant pretty much, maybe even 3k, but 4k and 5k is a pretty huge difference, then it slows down from 5k and onwards.

howardchung commented 8 years ago

Potential designs:

howardchung commented 8 years ago

Another implementation question to be answered:

Does winrate scale linearly with skill? I'm not sure it does. Example:

Player A has played 300 games on a hero and won 100, for a 33% win rate. Player B has played 150 games on a hero and won 100, for a 67% win rate.

Under the current scheme both players would have an equal ranking (assuming same MMR). Perhaps we should weight winrates so that you get an exponentially increasing number of points for having a high winrate.

Uesugi1 commented 8 years ago

Winrate is almost always linear with skill, pick any random hero on dotabuff and go to the player rankings, you will see almost 99% of people have a winrate over 60% on their respective heroes. The only ones you can find that arent over 60% are usually people with over 1500 games and they dont even play ranked.

If you also look at players that play exclusivly that hero you will see that they perform MUCH MUCH better than the usual guy and they are playing that hero to the fullest so it is not that hard to keep a steady winrate of 60% if youre really THAT good on that hero.

Calculating based on winrate is pretty good with only ranked games and if youre gonna use the higher mmr = higher rankings. BUT if that was used without the ranked games and without the 2nd system then youd have people create new accounts and then just stomp low ranked/unranked games with their hero and recieve an absurd amount of wins that just places them at the top

howardchung commented 8 years ago

I'm thinking it should be the product of:

Optionally we can restrict it to ranked games only.

howardchung commented 8 years ago

Actually I'm not sure if we should just straight multiply by MMR. Naturally there will be fewer people with higher MMRs, so all else being equal they should make up a corresponding slice of the distribution. They will probably tend to have higher win/loss ratio so perhaps it balances out in the end.