Closed huffyhenry closed 5 years ago
The exponential distribution is a better fit on the (shooting) team level too:
library(dplyr)
library(MASS)
expoL <- function(vec){
return(logLik(fitdistr(vec, "exponential")))
}
betaL <- function(vec){
return(logLik(fitdistr(vec, "beta", start=list(shape1=0.5, shape2=0.5))))
}
read.csv("../sbs-xg-review/data/sb.csv") %>%
filter(competition_name == "Premier League") %>%
filter(shot_set_play != "penalty") %>%
group_by(team_name) %>%
summarize(
expo=expoL(shot_xg),
beta=betaL(shot_xg)
) %>%
mutate(choice=ifelse(beta > expo, "beta", "expo"))
yields
That said, it is entirely possible that a beta (or another) model, where both teams modify the distribution parameters, is a better fit.
Shooting is best modelled as a two-step process, where shot quality is first sampled from an appropriate distribution, and conversion is then a Bernoulli experiment. The (unnormalized) likelihood of a datapoint then becomes the definite integral over (0,1) of
xi*pdf(xi | theta)dxi
for a goal and(1-xi)*pdf(xi | theta)dxi
for a miss, wherepdf
is the probability density function of the distribution of shot qualities, andtheta
are the parameters of that distribution, which can depend on any factors of interest, especially team identities and game state.Cf. #6.