huffyhenry / shot-generation

Bayesian modelling of shot generation and conversion in soccer
GNU Lesser General Public License v3.0
10 stars 0 forks source link

Model shooting as a generative process #27

Closed huffyhenry closed 5 years ago

huffyhenry commented 6 years ago

Shooting is best modelled as a two-step process, where shot quality is first sampled from an appropriate distribution, and conversion is then a Bernoulli experiment. The (unnormalized) likelihood of a datapoint then becomes the definite integral over (0,1) of xi*pdf(xi | theta)dxi for a goal and (1-xi)*pdf(xi | theta)dxi for a miss, where pdf is the probability density function of the distribution of shot qualities, and theta are the parameters of that distribution, which can depend on any factors of interest, especially team identities and game state.

Cf. #6.

huffyhenry commented 6 years ago

The exponential distribution is a better fit on the (shooting) team level too:

library(dplyr)
library(MASS)

expoL <- function(vec){
  return(logLik(fitdistr(vec, "exponential")))
}

betaL <- function(vec){
  return(logLik(fitdistr(vec, "beta", start=list(shape1=0.5, shape2=0.5))))
}

read.csv("../sbs-xg-review/data/sb.csv") %>%
  filter(competition_name == "Premier League") %>%
  filter(shot_set_play != "penalty") %>%
  group_by(team_name) %>%
  summarize(
    expo=expoL(shot_xg),
    beta=betaL(shot_xg)
  ) %>% 
  mutate(choice=ifelse(beta > expo, "beta", "expo"))

yields

untitled

That said, it is entirely possible that a beta (or another) model, where both teams modify the distribution parameters, is a better fit.