Feature Request - Modelling

Heya,

I'm the author/developer of ageofstatistics.com . I no longer have the time to maintain the site (especially after the changes / lack of stability with aoe2.net). I was wondering if you would be open to porting across some of the features to your site now that you are back developing again :) ?

Happy to discuss more but I think the main one I would want to stress is the use of logistic regression modelling in order to account Elo + other covariates.

An unfortunate fact about win rates is that if you don't include explanatory variables (i.e. just calculate naive win rates) they are biased towards 50%, given that Elo is such an influential part of matches it means most of the win rates you present will be underestimated (albeit the relative ordering / ranks should be preserved).

But yer if this is of interest to you (plus any other features from my site that you would like to incorporate) I'd be more than happy to chat. If not, then no hard feelings please feel free to ignore and close this issue :)

Thanks for reaching out @gowerc! First, I LOVE your site. Thank you so much for your work on it.

I would love to incorporate some of your ideas into my site. I agree that your logistic regression model is a much better model than my naive win rates. I really appreciate your methods section. I'm going to look to incorporate your model into my win rate calculations, if that's ok with you? Are there any other confounding variables that you think would be appropriate to incorporate?

I'm also curious if you think the "Averaged Win Rates" as you put it on your website would be more appropriate? I guess it's a bit up to interpretation. My gut says the logistic regression model suffices.

I would also love to add some of those graphs that include all of the civs on one plot. I've been planning something similar for a bit so I will definitely use your site as inspiration for some more graphs.

Want me to give you credit somewhere on the site? Want a social link posted in the FAQ or footer or the like?

Thinking about it a bit more, I could do this same thing and add in map as a parameter to get more accurate map win rate results. 🤔 That seems pretty cool.

Also I'm assuming once you fit your model you asked it to predict the civ's win rate given a elo difference of 0 to get the "overall" win rate? Or am I misunderstanding?

I would love to incorporate some of your ideas into my site. I agree that your logistic regression model is a much better model than my naive win rates. I really appreciate your methods section.

If you need help with any of the methods do feel free to ask!

I'm going to look to incorporate your model into my win rate calculations, if that's ok with you?

I don't own logistic regression :laughing: so definitely fine with me :smile:

Are there any other confounding variables that you think would be appropriate to incorporate?

(apologies in advance for the wall of text here)

A simple list of things that would realistically affect the outcome of a match

True Player Elo (not to be confused with the actual observed Elo that we have in the data which will deviate around their "true Elo")
Player familiarity with their chosen civ
Player familiarity with the map (and civ with the map)
Player familiarity with the opponent civ

But yer its basically impossible to model the above because there just isn't enough data for each player (plus there are way too many civs). Below are some ideas that I had to try and capture the above (albeit far from perfect):

win streak or deviation from a players average Elo, if a player has won their last 5 matches in a row they have likely drifted into a higher Elo and are very likely to lose the next game. I was thinking of capturing this as the players Elo - their average Elo over their last 10 games. This might be limited by the large tail of players who only have a handful of games.
Modified team Elo, the idea is that calculating team Elo as the average of the team is likely not accurate as higher Elo players have an undue effect on the game e.g. tend to see that a team of 1000, 1000, 1000 would lose to a team of 1200, 1000, 800. Never quite worked out how to formulate this but was thinking of taking the average of the exponentials of the Elo
Modified Elo by Map type - The idea here was that players have different familiarity with each map class (open / closed / hybrid) a common issue is players playing all their games in Arabia but then losing the 1 game they are forced to play on Arena, i.e. their overall Elo doesn't really capture their weakness /lack of familiarity on certain maps. So was considering re-deriving Elos for each of these groups. Having said that it doesn't make sense to do it completely independently as a 2000 Elo open player isn't going to be 800 Elo closed player. Something like a modified formula for closed maps that gives only a 25% weighting to games on Open maps... When I tried this though I struggled with the fact that I was missing ~25% of matches so my overall re-derivation had worse predictive performance.

I'm also curious if you think the "Averaged Win Rates" as you put it on your website would be more appropriate? I guess it's a bit up to interpretation. My gut says the logistic regression model suffices.

Just to be clear both win rate types that I show were created by regression models, the difference is basically how you weight them. In terms of which one is better there is no correct answer, they just show different things. The "Averaged Win Rate" basically shows your civs expected win rate assuming your opponent is selecting "Random" whilst the normal win rate shows your civ's win rate assuming your opponent is selecting civ's based upon the observed pick rates (e.g. they are more likely to be Franks :smile: )

I would also love to add some of those graphs that include all of the civs on one plot. I've been planning something similar for a bit so I will definitely use your site as inspiration for some more graphs.

That would be awesome if you could!! One of the original inspirations for creating my site was I wanted a more visual representation of the data. Your site was amazing for the raw information but (at least personally) I always found plots better for quick visual comparisons.

Want me to give you credit somewhere on the site? Want a social link posted in the FAQ or footer or the like?

O no need for this at all, I mean if you want to feel free but I don't need / want any credit, I am just happy to see you are back developing as the community really benefits from a resource like yours :smile:

Thinking about it a bit more, I could do this same thing and add in map as a parameter to get more accurate map win rate results.

Thing is you would have to add it as a civ * map interaction term. Which is perfectly doable but the problem I found is the model becomes very hard to fit computationally with so many parameters. I was running it on a 32GB RAM machine and still running out of memory, I had to end up reducing the model and down sampling the data in a few cases. I started looking into more memory efficient implementations but didn't really get anyway :cry:

Also I'm assuming once you fit your model you asked it to predict the civ's win rate given a elo difference of 0 to get the "overall" win rate

Ya pretty much this. Looking back over my code I essentially structured the data as 1 row per player per match with a columns civ , diff in elo, won e.g.

match ID	civ	diff in elo	won
1	Franks	+100	1
1	Cumans	-100	0
2	Britons	+64	0
2	Spanish	-64	1
3	Huns	+45	1
3	Spanish	-45	0

For team games I just difference in mean team Elo (though yer note my above bullet point about how this could be better modified).

The model is then won ~ 0 + civ + diff_in_elo (using R's modelling notation, the "0" just means don't use an intercept term). Though R automatically expands out civ to be 1 column per civ. Not sure what language you use on the back end but Pythons patsy module can provide similar functionalities for helping to create proper design matricies from categorical data using the above formula notation.

Couple of additional points that came to mind:

If you aren't already doing so I would highly recommend removing players who only play the same civ or any single civ more than like 80% of their games. These players badly bias win rates in quite unpredictable ways.
I was thinking for the "civ familiarity" you could maybe generate a proxy for it by categorising the civs into broad buckets say "archers", "cav", "meso", etc and then having a column "familar_with_civ" that is set to 1 if they are currently playing a civ that is in their most played category. I.e. if a player plays 90% of their games as archers civs then they would have a 1 if they are playing a match with say Britains and 0 if they were playing a match with Franks.
In terms of the graphics I was originally using plotly but started running into issues with scaling and fonts so started trying to upgrade the plots to vegalite which looked quite promising (though never managed to fully implement). That being said I am pretty confident you know way more than me about JS libraries :laughing: but figured I'd mention it anyway.

So I've been playing around with this model and while I think my code is correct, I don't see a large difference in the predicted win rates versus the mean win rate. But perhaps I'm doing something incorrectly and not setting up my model right?

Here's the output, wheredata_df is a dataframe where column w is whether the player won/loss c is civ number, "d" is difference in rating. C(c) indicates I'm treating the integer as a categorical variable. data_df has 809,582 observations and contains all players on the latest patch across all ratings.

In [74]: data_df.head()
Out[74]:
   w   c     d
0  0  28  16.0
1  1  40   4.0
2  0  16  -8.0
3  1  36   8.0
4  0  27 -22.0

In [75]: glm_mod = glm("w ~ 0 + C(c) + d", data_df, family=sm.families.Binomial()).fit()

In [76]: print(glm_mod.summary())
                 Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:                      w   No. Observations:               809582
Model:                            GLM   Df Residuals:                   809539
Model Family:                Binomial   Df Model:                           42
Link Function:                  Logit   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:            -5.5389e+05
Date:                Mon, 10 Apr 2023   Deviance:                   1.1078e+06
Time:                        15:06:29   Pearson chi2:                 8.10e+05
No. Iterations:                     5   Pseudo R-squ. (CS):            0.01780
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
C(c)[1]       -0.0458      0.014     -3.257      0.001      -0.073      -0.018
C(c)[2]       -0.1219      0.029     -4.140      0.000      -0.180      -0.064
C(c)[3]        0.1114      0.015      7.504      0.000       0.082       0.140
C(c)[4]       -0.0389      0.021     -1.826      0.068      -0.081       0.003
C(c)[5]       -0.1105      0.011    -10.401      0.000      -0.131      -0.090
C(c)[6]        0.0625      0.015      4.218      0.000       0.033       0.092
C(c)[7]       -0.0416      0.016     -2.652      0.008      -0.072      -0.011
C(c)[8]       -0.0679      0.022     -3.105      0.002      -0.111      -0.025
C(c)[9]       -0.0576      0.013     -4.393      0.000      -0.083      -0.032
C(c)[10]       0.0029      0.017      0.177      0.859      -0.030       0.035
C(c)[11]      -0.1814      0.016    -11.283      0.000      -0.213      -0.150
C(c)[12]      -0.0037      0.015     -0.252      0.801      -0.032       0.025
C(c)[13]       0.0125      0.026      0.479      0.632      -0.039       0.064
C(c)[14]       0.0123      0.012      1.001      0.317      -0.012       0.036
C(c)[15]       0.1236      0.008     14.894      0.000       0.107       0.140
C(c)[16]      -0.0115      0.014     -0.803      0.422      -0.040       0.017
C(c)[17]       0.1025      0.020      5.142      0.000       0.063       0.142
C(c)[18]      -0.0511      0.014     -3.651      0.000      -0.078      -0.024
C(c)[19]       0.0289      0.014      2.054      0.040       0.001       0.056
C(c)[20]      -0.0340      0.020     -1.669      0.095      -0.074       0.006
C(c)[21]      -0.0556      0.016     -3.436      0.001      -0.087      -0.024
C(c)[22]      -0.0048      0.015     -0.321      0.748      -0.034       0.025
C(c)[23]      -0.0633      0.013     -4.713      0.000      -0.090      -0.037
C(c)[24]      -0.1296      0.020     -6.639      0.000      -0.168      -0.091
C(c)[25]       0.0714      0.011      6.458      0.000       0.050       0.093
C(c)[26]       0.0141      0.012      1.210      0.226      -0.009       0.037
C(c)[27]      -0.1454      0.020     -7.354      0.000      -0.184      -0.107
C(c)[28]       0.0313      0.017      1.792      0.073      -0.003       0.066
C(c)[29]      -0.0691      0.012     -5.555      0.000      -0.094      -0.045
C(c)[30]       0.0483      0.010      4.708      0.000       0.028       0.068
C(c)[31]      -0.0273      0.016     -1.750      0.080      -0.058       0.003
C(c)[32]      -0.0240      0.015     -1.628      0.104      -0.053       0.005
C(c)[33]       0.1264      0.012     10.484      0.000       0.103       0.150
C(c)[34]      -0.1130      0.018     -6.356      0.000      -0.148      -0.078
C(c)[35]      -0.0341      0.022     -1.577      0.115      -0.076       0.008
C(c)[36]      -0.0424      0.020     -2.145      0.032      -0.081      -0.004
C(c)[37]       0.0486      0.012      3.951      0.000       0.024       0.073
C(c)[38]      -0.1010      0.018     -5.616      0.000      -0.136      -0.066
C(c)[39]       0.0300      0.013      2.235      0.025       0.004       0.056
C(c)[40]       0.1146      0.013      9.067      0.000       0.090       0.139
C(c)[41]      -0.2210      0.018    -12.349      0.000      -0.256      -0.186
C(c)[42]       0.0698      0.016      4.418      0.000       0.039       0.101
d              0.0079   7.67e-05    103.119      0.000       0.008       0.008
==============================================================================

In [77]: glm_mod.predict(pd.DataFrame({"c": [Civ.franks.value, Civ.huns.value, Civ.chinese.value], "d": [0, 0, 0]}))
Out[77]:
0    0.530856
1    0.507214
2    0.454768
dtype: float64

The very naive win rates right now respectively are:

53.11
50.63
45.45

Hard to truly say without access to the code & data though nothing you've shown above looks obviously wrong. Must admit I am a bit surprised. Will double check what I was seeing with my historic cuts of the data. Some general thoughts:

1) Originally I was fitting on much smaller cuts of data than what you have used (e.g. >1200 Elo on Open maps only) so would assume you would see greater swings in results here. Likewise I think when I remember looking at this in the past I was seeing swings in the order of 0.2-0.4 points. (e.g. 50.4 -> 50.8). Would be curious as to what you see if you filter to >=1200 Elo 2) Generally civ choice matters less at lower Elos (and they make up the bulk of the data). Part of me is wondering if difference in Elo matters less at lower levels as well ? I never looked into a variable Elo coefficient. 3) You can see from the Rsquared value though that the overall predictive power is quite poor, this means there is still a lot of opportunity to refine the model based upon some of the other factors I mentioned. (this was the main thing I was excited to try and push into but never got around to).

Doing some quick sanity checks the theoretical win % of someone with a 25 Elo advantage is 53.59% and according to your model its coming out as 54.92% which is very much in the same ball park so would be very surprised if there was a mistake in your code.

jerkeeler / aoestats-redux-community

Feature Request - Modelling #9