nflverse / nflverse-pbp

builds play by play and player stats for nflverse/nflverse-data
Creative Commons Attribution 4.0 International
290 stars 66 forks source link

something odd with WPA #36

Open ak47twq opened 4 years ago

ak47twq commented 4 years ago

I compared the diff of two plays' home_wp_post and WPA in the database. Is WPA suppose to be the diff of two plays' home_wp_post? Most numbers check out, but some numbers dont make sense.

Why timeOUT has a different home_wp_post?

Here is what i do:

tic()
test<-pbp %>%
         filter(game_id == "2009_18_GB_ARI",!is.na(home_wp_post)) %>%
         select(game_id,play_id, qtr, desc, total, spread_line, home_wp_post, wpa) %>%
         collect()
toc()

tic()
test <- test %>%
     mutate(wp_diff1 = abs(wpa))
toc()

tic()
test[1,'wp_diff2'] = 0

rownum <- nrow(test)

for (i in 2:rownum){
test[i,'wp_diff2']=abs(test[i,'home_wp_post']-test[i-1,'home_wp_post'])
}
toc()

temp<-test%>%filter(wp_diff2!=wp_diff1)

WPA1 WPA2

mrcaseb commented 4 years ago

Here is some more efficient code to reproduce this

pbp %>%
  filter(game_id == "2009_18_GB_ARI", !is.na(home_wp_post)) %>%
  select(game_id, play_id, play_type, desc, home_team, posteam, wp, home_wp, wpa, home_wp_post) %>%
  mutate(
    wp_diff1 = abs(wpa),
    wp_diff2 = abs(home_wp_post - lag(home_wp_post))
  ) %>%
  filter(wp_diff2 != wp_diff1)

output

# A tibble: 4 x 12
  game_id   play_id play_type desc                                home_team posteam    wp home_wp      wpa home_wp_post wp_diff1 wp_diff2
  <chr>       <dbl> <chr>     <chr>                               <chr>     <chr>   <dbl>   <dbl>    <dbl>        <dbl>    <dbl>    <dbl>
1 2009_18_~    1416 no_play   (7:42) J.Kuhn right tackle to ARI ~ ARI       GB      0.153   0.847  0.00151        0.847  0.00151  0      
2 2009_18_~    1437 run       (7:02) A.Rodgers up the middle for~ ARI       GB      0.155   0.845 -0.00730        0.852  0.00730  0.00580
3 2009_18_~    4108 no_play   Timeout #1 by ARI at 01:46.         ARI       GB      0.639   0.361  0              0.361  0        0.278  
4 2009_18_~    4125 pass      (1:46) (Shotgun) K.Warner pass sho~ ARI       ARI     0.639   0.639  0.0216         0.661  0.0216   0.300  

home_wp_post of the play 1416 is modified in this line https://github.com/mrcaseb/nflfastR/blob/9ae4bb1951a5b4302bc0e3e83261f5bb4406af32/R/helper_add_ep_wp.R#L1011 where home_wp_post is set to the previous value if the current play and the previous play are "no_play"s

The 4108 play appears to have switched home_wp and away_wp.

Any insights @guga31bb ?

guga31bb commented 4 years ago

This is the equivalent part in nflscrapR and I guess we must have modified it at some point, though I can't remember why. I personally have never used home_wp_post or WPA so I'm surprised we bothered to modify nflscrapR here- there must have been some bug addressed at some point?

mrcaseb commented 4 years ago

finally found the commit but it's not really informative lol https://github.com/mrcaseb/fastscraper/commit/12a03f956b313bcf6b247159474100aa93ae7403#diff-0a766e08dadf2046e3cf5c64e0d680e7315073ec7f764627a0d55f68e13136c0

It's line 766-769 in that commit

guga31bb commented 4 years ago

That commit was mostly me just copy and pasting nflscrapR's part. But it's weird because it doesn't look identical to nflscrapR in that section