nflverse / nflfastR

A Set of Functions to Efficiently Scrape NFL Play by Play Data
https://www.nflfastr.com/
Other
414 stars 50 forks source link

[BUG] "number of items to replace is not a multiple of replacement length" #432

Closed dennisbrookner closed 12 months ago

dennisbrookner commented 12 months ago

Is there an existing issue for this?

Have you installed the latest development version of the package(s) in question?

What version of the package do you have?

4.5.1.9013

Describe the bug

Calling update_db() throws the following error message:

> update_db()
── Update nflfastR Play-by-Play Database ─────────────────────────────────────── nflfastR version 4.5.1.9013 ──
• 17:46:33 | Checking for missing completed games...
ℹ 17:46:34 | You have 6435 games and are missing 8.
• 17:46:34 | Start download of 8 games...
ℹ It is recommended to use parallel processing when trying to load multiple games.Please consider running `future::plan("multisession")`! Will go on sequentially...
✔ 17:46:39 | Download finished. Adding variables...
✔ 17:46:39 | added game variables
✔ 17:46:39 | added nflscrapR variables
Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp : 
  number of items to replace is not a multiple of replacement length
> 

This is true for:

Reprex

r
nflfastR::update_db()
#> ── Update nflfastR Play-by-Play Database ──────── nflfastR version 4.5.1.9013 ──
#> ℹ 17:51:21 | Can't find the data table "nflfastR_pbp"
#> in your database. Will load the play by play data from
#> scratch.
#> 
#> • 17:51:21 | Starting download of 25 seasons between 1999 and 2023...
#> 
#> • 17:54:38 | Checking for missing completed games...
#> 
#> ℹ 17:54:41 | You have 6435 games and are missing 8.
#> 
#> • 17:54:41 | Start download of 8 games...
#> 
#> ℹ It is recommended to use parallel processing when trying to load multiple games.Please consider running `future::plan("multisession")`! Will go on sequentially...
#> 
#> ✔ 17:54:48 | Download finished. Adding variables...
#> 
#> ✔ 17:54:48 | added game variables
#> 
#> ✔ 17:54:48 | added nflscrapR variables
#> [17:54:48] WARNING: amalgamation/../src/learner.cc:438: 
#>   If you are loading a serialized model (like pickle in Python, RDS in R) generated by
#>   older XGBoost, please export the model by calling `Booster.save_model` from that version
#>   first, then load it back in current version. See:
#> 
#>     https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
#> 
#>   for more details about differences between saving model and serializing.
#> Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp: number of items to replace is not a multiple of replacement length

Created on 2023-09-17 with reprex v2.0.2


### Expected Behavior

`update_db()` should run and grab the latest play-by-play data, rebuilding when requested.

### nflverse_sitrep

```r
nflverse_sitrep()
── System Info ────────────────────────────────────────────────────────────────────────────────────────────────
• R version 4.2.1 (2022-06-23) • Running under: macOS Ventura 13.1
── Package Status ─────────────────────────────────────────────────────────────────────────────────────────────
   package  installed  cran        dev behind
1   nfl4th      1.0.4 1.0.4 1.0.4.9000    dev
2 nflfastR 4.5.1.9013 4.5.1 4.5.1.9013       
3 nflplotR      1.1.0 1.1.0 1.1.0.9006    dev
4 nflreadr      1.4.0 1.4.0   1.4.0.03    dev
5 nflseedR      1.2.0 1.2.0      1.2.0       
6 nflverse      1.0.3 1.0.3      1.0.3       
── Package Options ────────────────────────────────────────────────────────────────────────────────────────────
• No options set for above packages
── Package Dependencies ───────────────────────────────────────────────────────────────────────────────────────
• askpass     (1.1)     • gsubfn     (0.7)       • proto        (1.0.0)    
• backports   (1.4.1)   • gtable     (0.3.1)     • purrr        (0.3.5)    
• cachem      (1.0.6)   • httr       (1.4.4)     • R6           (2.5.1)    
• cli         (3.4.1)   • isoband    (0.2.6)     • rappdirs     (0.3.3)    
• codetools   (0.2-18)  • janitor    (2.1.0)     • RColorBrewer (1.1-3)    
• colorspace  (2.0-3)   • jsonlite   (1.8.2)     • Rcpp         (1.0.9)    
• compiler    (4.2.1)   • labeling   (0.4.2)     • rlang        (1.0.6)    
• cpp11       (0.4.3)   • lattice    (0.20-45)   • rstudioapi   (0.14)     
• crayon      (1.5.2)   • lifecycle  (1.0.3)     • scales       (1.2.1)    
• curl        (4.3.3)   • listenv    (0.8.0)     • snakecase    (0.11.0)   
• data.table  (1.14.4)  • lubridate  (1.8.0)     • splines      (4.2.1)    
• digest      (0.6.30)  • magick     (2.7.3)     • stats        (4.2.1)    
• dplyr       (1.0.10)  • magrittr   (2.0.3)     • stringi      (1.7.8)    
• ellipsis    (0.3.2)   • MASS       (7.3-58.1)  • stringr      (1.4.1)    
• fansi       (1.0.3)   • Matrix     (1.5-1)     • sys          (3.4.1)    
• farver      (2.1.1)   • memoise    (2.0.1)     • tibble       (3.1.8)    
• fastmap     (1.1.0)   • methods    (4.2.1)     • tidyr        (1.2.1)    
• fastrmodels (1.0.2)   • mgcv       (1.8-40)    • tidyselect   (1.2.0)    
• furrr       (0.3.1)   • mime       (0.12)      • tools        (4.2.1)    
• future      (1.28.0)  • munsell    (0.5.0)     • utf8         (1.2.2)    
• generics    (0.1.3)   • nlme       (3.1-160)   • utils        (4.2.1)    
• ggplot2     (3.3.6)   • openssl    (2.0.4)     • vctrs        (0.4.2)    
• globals     (0.16.1)  • parallel   (4.2.1)     • viridisLite  (0.4.1)    
• glue        (1.6.2)   • parallelly (1.32.1)    • withr        (2.5.0)    
• graphics    (4.2.1)   • pillar     (1.8.1)     • xgboost      (1.6.0.1)  
• grDevices   (4.2.1)   • pkgconfig  (2.0.3)       
• grid        (4.2.1)   • progressr  (0.11.0)      
───────────────────────────────────────────────────────────────────────────────────────────────────────────────
>

Screenshots

No response

Additional context

Full error traceback (originates from a slightly nested call, but analogous to the above)

Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp :
number of items to replace is not a multiple of replacement length
67.
predict.gam(object, newdata = newdata, type = type, se.fit = se.fit,
terms = terms, exclude = exclude, block.size = block.size,
newdata.guaranteed = newdata.guaranteed, na.action = na.action,
...)
66.
mgcv::predict.bam(fastrmodels::fg_model, newdata = pbp_data,
type = "response")
65.
add_ep_variables(.)
64.
pbp %>% add_ep_variables()
63.
add_ep(.)
62.
dplyr::filter(., !is.na(.data$air_yards))
61.
pbp %>% dplyr::filter(!is.na(.data$air_yards))
60.
nrow(pbp %>% dplyr::filter(!is.na(.data$air_yards)))
59.
add_air_yac_ep(.)
58.
nrow(pbp_data)
57.
add_wp_variables(.)
56.
pbp %>% add_wp_variables()
55.
add_wp(.)
54.
dplyr::filter(., !is.na(.data$air_yards))
53.
pbp %>% dplyr::filter(!is.na(.data$air_yards))
52.
nrow(pbp %>% dplyr::filter(!is.na(.data$air_yards)))
51.
add_air_yac_wp(.)
50.
dplyr::mutate(., receiver_player_name = stringr::str_extract(.data$desc,
"(?<=((to)|(for))\\s[:digit:]{0,2}\\-{0,1})[A-Z][A-z]*\\.\\s?[A-Z][A-z]+(\\s(I{2,3})|(IV))?"),
pass_middle = dplyr::if_else(.data$pass_location == "middle",
1, 0), air_is_zero = dplyr::if_else(.data$air_yards == ...
49.
dplyr::select(., "complete_pass", "air_yards", "yardline_100",
"ydstogo", "down1", "down2", "down3", "down4", "air_is_zero",
"pass_middle", "era2", "era3", "era4", "qb_hit", "home",
"outdoors", "retractable", "dome", "distance_to_sticks", ...
48.
pbp %>% dplyr::mutate(receiver_player_name = stringr::str_extract(.data$desc,
"(?<=((to)|(for))\\s[:digit:]{0,2}\\-{0,1})[A-Z][A-z]*\\.\\s?[A-Z][A-z]+(\\s(I{2,3})|(IV))?"),
pass_middle = dplyr::if_else(.data$pass_location == "middle",
1, 0), air_is_zero = dplyr::if_else(.data$air_yards == ...
47.
prepare_cp_data(pbp)
46.
add_cp(.)
45.
dplyr::mutate(., old_posteam = .data$posteam, posteam = dplyr::case_when(.data$kickoff_attempt ==
1 & (.data$own_kickoff_recovery == 1 | .data$fumble_lost ==
1) ~ .data$defteam, stringr::str_detect(.data$desc, kickoff_finder) &
.data$own_kickoff_recovery == 0 & dplyr::lead(.data$own_kickoff_recovery == ...
44.
dplyr::group_by(., .data$game_id, .data$game_half)
43.
dplyr::mutate(., row = 1:dplyr::n(), new_drive = dplyr::if_else(.data$posteam !=
dplyr::lag(.data$posteam) | (.data$posteam != dplyr::lag(.data$posteam,
2) & is.na(dplyr::lag(.data$posteam))) | (.data$posteam !=
dplyr::lag(.data$posteam, 3) & is.na(dplyr::lag(.data$posteam, ...
42.
dplyr::group_by(., .data$game_id)
41.
dplyr::mutate(., fixed_drive = cumsum(.data$new_drive), tmp_result = dplyr::case_when(.data$touchdown ==
1 & .data$posteam == .data$td_team ~ "Touchdown", .data$touchdown ==
1 & .data$posteam != .data$td_team ~ "Opp touchdown", .data$field_goal_result ==
"made" ~ "Field goal", .data$field_goal_result %in% c("blocked", ...
40.
dplyr::group_by(., .data$game_id, .data$fixed_drive)
39.
dplyr::mutate(., fixed_drive_result = dplyr::if_else(dplyr::last(stats::na.omit(.data$tmp_result)) ==
"End of half", dplyr::first(stats::na.omit(.data$tmp_result)),
dplyr::last(stats::na.omit(.data$tmp_result))))
38.
dplyr::ungroup(.)
37.
dplyr::mutate(., posteam = .data$old_posteam)
36.
dplyr::select(., -"row", -"new_drive", -"tmp_result", -"old_posteam")
35.
d %>% dplyr::mutate(old_posteam = .data$posteam, posteam = dplyr::case_when(.data$kickoff_attempt ==
1 & (.data$own_kickoff_recovery == 1 | .data$fumble_lost ==
1) ~ .data$defteam, stringr::str_detect(.data$desc, kickoff_finder) &
.data$own_kickoff_recovery == 0 & dplyr::lead(.data$own_kickoff_recovery == ...
34.
add_drive_results(.)
33.
dplyr::mutate(., old_posteam = .data$posteam, posteam = dplyr::case_when(.data$kickoff_attempt ==
1 & (.data$own_kickoff_recovery == 1 | .data$fumble_lost ==
1) ~ .data$defteam, stringr::str_detect(.data$desc, kickoff_finder) &
.data$own_kickoff_recovery == 0 & dplyr::lead(.data$own_kickoff_recovery == ...
32.
dplyr::group_by(., .data$game_id, .data$game_half)
31.
dplyr::mutate(., row = 1:dplyr::n(), new_series = dplyr::if_else(.data$fixed_drive !=
dplyr::lag(.data$fixed_drive) | ((dplyr::lag(.data$first_down_rush) ==
1 | dplyr::lag(.data$first_down_pass) == 1 | dplyr::lag(.data$first_down_penalty) ==
1) & dplyr::lag(.data$touchdown) == 0) | .data$row == 1, ...
30.
dplyr::group_by(., .data$game_id)
29.
dplyr::mutate(., series = cumsum(.data$new_series), tmp_result = dplyr::case_when((.data$first_down_penalty ==
1 | .data$first_down_rush == 1 | .data$first_down_pass ==
1) & touchdown == 0 ~ "First down", .data$touchdown == 1 &
.data$posteam == .data$td_team ~ "Touchdown", .data$touchdown == ...
28.
dplyr::group_by(., .data$game_id, .data$series)
27.
dplyr::mutate(., series_result = dplyr::if_else(dplyr::last(stats::na.omit(.data$tmp_result)) ==
"End of half", dplyr::first(stats::na.omit(.data$tmp_result)),
dplyr::last(stats::na.omit(.data$tmp_result))), series_success = dplyr::if_else(.data$series_result %in%
c("Touchdown", "First down"), 1, 0))
26.
dplyr::ungroup(.)
25.
dplyr::mutate(., posteam = .data$old_posteam)
24.
dplyr::select(., -"row", -"tmp_result", -"new_series", -"old_posteam")
23.
pbp %>% dplyr::mutate(old_posteam = .data$posteam, posteam = dplyr::case_when(.data$kickoff_attempt ==
1 & (.data$own_kickoff_recovery == 1 | .data$fumble_lost ==
1) ~ .data$defteam, stringr::str_detect(.data$desc, kickoff_finder) &
.data$own_kickoff_recovery == 0 & dplyr::lead(.data$own_kickoff_recovery == ...
22.
add_series_data(.)
21.
dplyr::select(., tidyselect::any_of(c(nflscrapr_cols, new_cols,
api_cols)))
20.
pbp %>% dplyr::select(tidyselect::any_of(c(nflscrapr_cols, new_cols,
api_cols)))
19.
withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
18.
suppressWarnings(out <- pbp %>% dplyr::select(tidyselect::any_of(c(nflscrapr_cols,
new_cols, api_cols))))
17.
select_variables(.)
16.
pbp %>% add_game_data(...) %>% add_nflscrapr_mutations() %>%
add_ep() %>% add_air_yac_ep() %>% add_wp() %>% add_air_yac_wp() %>%
add_cp() %>% add_drive_results() %>% add_series_data() %>%
select_variables()
15.
withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
14.
suppressWarnings({
p <- progressr::progressor(along = game_ids)
pbp <- furrr::future_map_dfr(game_ids, function(x, p, dir,
...) { ...
13.
fast_scraper(game_ids = game_ids, dir = dir, ..., in_builder = builder)
12.
nrow(pbp)
11.
clean_pbp(., in_builder = builder)
10.
nrow(pbp)
9.
add_qb_epa(., in_builder = builder)
8.
nrow(pbp)
7.
add_xyac(., in_builder = builder)
6.
nrow(pbp)
5.
add_xpass(., in_builder = builder)
4.
fast_scraper(game_ids = game_ids, dir = dir, ..., in_builder = builder) %>%
clean_pbp(in_builder = builder) %>% add_qb_epa(in_builder = builder) %>%
add_xyac(in_builder = builder) %>% add_xpass(in_builder = builder)
3.
build_nflfastR_pbp(missing, rules = FALSE)
2.
update_db(force_rebuild = force_rebuild) at puntr_extras.R#37
1.
get_punts(years = 2021:2023, include_blocks = TRUE, seasontype = "REG")
dennisbrookner commented 12 months ago

Oh also! Earlier in the afternoon, I was getting this error along with a message that the KC JAX game from today wasn't yet available. I'm not seeing that anymore, but it leads me to believe that different data sources aren't agreeing about what games have finished.

guga31bb commented 12 months ago

Please try updating nflreadr to dev version and trying again

dennisbrookner commented 12 months ago

I'm confused about this, because I believe that I did that:

> nflverse::nflverse_update(devel = TRUE)
ℹ The following packages are out of date:
• nfl4th   (1.0.4 -> 1.0.4.9000)
• nflplotR (1.1.0 -> 1.1.0.9006)
• nflreadr (1.4.0 -> 1.4.0.3   )

but isn't the update_db() function part of nflfastR? Do I need to update that too? Or am I using the wrong function?

EDIT: Is the above call not actually updating anything, just checking versions?

dennisbrookner commented 12 months ago

Yay that worked, thanks!! That's my bad, I just skimmed the output of nflverse::nflverse_update(devel = TRUE) and didn't realize that things were out of date!

guga31bb commented 12 months ago

Glad it worked!