odota / core

Open source Dota 2 data platform
https://www.opendota.com
MIT License
1.52k stars 303 forks source link

Expected win% incorrectly calculated? #959

Closed albertcui closed 8 years ago

albertcui commented 8 years ago

Reported via Discord

Hey, I'd like to point out an oversight on the dyads and triads picks on YASP Under the expected win% column, it's been calculated as an average which is incorrect for hero pairings/triples The (mostly) correct formula is, for 2 heroes with winrates A and B Expected winrate = 100 - (AB - 100A - 100B + 10000) / 50 I've written about it here: http://www.dotabuff.com/topics/2016-04-07-statistical-best-hero-combos?page=1#comment-682332 I don't use this platform so if needed you can contact me on steam here: http://steamcommunity.com/profiles/76561198047145068

howardchung commented 8 years ago

I don't understand this. Can someone explain it to me

howardchung commented 8 years ago

I would also need it for the 3 hero case if I'm going to implement it.

Destinii commented 8 years ago

I'm not 100% sure, but it seems to fit the data extremely well I've done a bit more thinking and if you want to be REALLY accurate, there are actually 3 cases Basically, when you combine winning or losing heroes, you increase/decrease your winrate with diminishing returns Case 1 is if we combine 2 or more >50% WR heroes, we need to make the winrate asymptotically approach 100% Case 2 is if we combine 2 or more <50% WR heroes, we need to make the winrate asymptotically approach 0% We do this by transforming the winrate to a ratio below 1, where we can multiply the ratios to approach 0, which represents 100% winrate in the case of >50% heroes, or 0% winrate in the case of <50% heroes. For Case 1, we divide by 50, then subtract this number from 2, i.e. (2 - WR/50) For Case 2, we simply divide by 50 We then multiply these ratios together for the combined heroes, and transform it back into a % winrate. Case 3 is if we combine a >50% WR hero with a <50% WR hero, in this case I think it is appropriate to average the two winrates. This is what you have currently implemented. I don't know how difficult it would be to implement 3 functions for a single value, and it would be a bit more complicated for triads. We can use the same function (but multiply by an extra term for the third hero) if we have 3 heroes which are all >50% or <50%. But if we have a 2 hero:1 hero split of over/under 50% WR, I think the best is to combine the 2 heroes as in Cases 1 or 2, and then average with the 1 hero as in Case 3.

What is much easier, and still pretty damn accurate, is to ditch the 3 case scenario and go with a single function. It is still very accurate because the individual hero winrates are close enough to 50% so that problems with the model are negligible. This is the calculated winrates according to Case 1, fitted against the actual winrate data. herowinrates You can try it yourself and see

For case 1 (what I did above)

2 heroes with winrates A and B Recall that we transform to get (2 - A / 50) and (2 - B / 50) Multiply together and transform back, if you simplify you get to: Compound winrate = 100 - (100 - A)(100 - B) / 50

Similarly, for 3 heroes with winrates A, B and C Compound winrate = 100 - (100 - A)(100 - B)(100 - C) / 2500

howardchung commented 8 years ago

Let's scale it all to fractions of 1.

So the formula is: 1-(1-A) / (1/1) for 1 1-(1-A)(1-B) / (1/2) for 2 1-(1-A)(1-B)(1-C) / (1/3) for 3? I am not sure where the 2500 came from.

howardchung commented 8 years ago

In code:

    return 1 - rates.reduce((prev, curr) => (1 - prev) * curr, 1) / (1 / rates.length);

Seems like I'm doing something wrong, I'm seeing really weird values for 2, anyway.

Destinii commented 8 years ago

If you want to scale to fractions of 1, it would be 1-(100-A)(100-B)/5000 for 2 1-(100-A)(100-B)(100-C)/250000 for 3

The 2500 is because we have an extra 50 on denominator compared to 2 hero It is just simplified version of this: 50(2-(2-A/50)(2-B/50)(2-C/50))

howardchung commented 8 years ago

A, B and C are all a fraction between 0 and 1. So dividing everything on the right by 100 and scaling down the denominator to fit appears to be: Math.pow(50, rates.length-1), which produces 1, 50, 2500.

howardchung commented 8 years ago

Actually that doesn't seem to work because multiplying numbers between 0 and 1 reduces the result rather than increasing it. I don't like arbitrarily scaling it up to 100, but I'm not sure if there's a better approach.

Implementation:

    return 1 - rates.reduce((prev, curr) => (100 - curr*100) * prev, 1) / (Math.pow(50, rates.length-1)*100);
Destinii commented 8 years ago

Not sure if I follow, plus I don't know coding so I can't help you with that part. In the formulas I wrote, A B and C are all in %

howardchung commented 8 years ago

For purity I'd prefer to work with decimals (between 0 and 1) throughout. The only part where they get rendered to % values is at the final step when we display the data.

In order for your formula to work, A, B, and C must all be >1. This means we have to scale the decimal by an arbitrary factor (which is 100 right now). Does the formula hold for any value of this factor?