nflverse / nflreadr

Efficiently download nflverse data
https://nflreadr.nflverse.com/
Other
62 stars 12 forks source link

[BUG] nflreadr:::join_coalesce() converts dates to ints when joining #213

Closed john-b-edwards closed 1 year ago

john-b-edwards commented 1 year ago

Is there an existing issue for this?

Have you installed the latest development version of the package(s) in question?

What version of the package do you have?

1.3.2.11

Describe the bug

The internal function nflreadr:::join_coalesce() converts date objects to integers when joining.

Reprex

df1 <- data.frame(
  name = c("Tom",
           "Bill",
           "Jen"),
  birth_date = c("1998-05-01",
                 "1997-12-12",
                 NA_character_)
) |>
  dplyr::mutate(birth_date = lubridate::as_date(birth_date))

df2 <- data.frame(
  name = c("Jen",
           "Tom",
           "Bill"),
  birth_date = c("1995-01-31",
                 NA_character_,
                 "1997-12-12")
) |>
  dplyr::mutate(birth_date = lubridate::as_date(birth_date))

nflreadr:::join_coalesce(
  df1,
  df2,
  by=c("name")
)
#>    name birth_date
#> 1: Bill      10207
#> 2:  Jen       9161
#> 3:  Tom      10347

Expected Behavior

The object returned by nflreadr:::join_coalesce() should instead look like:

#>   name birth_date
#> 1  Tom 1998-05-01
#> 2 Bill 1997-12-12
#> 3  Jen 1995-01-31

nflverse_sitrep

── System Info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• R version 4.3.1 (2023-06-16 ucrt)   • Running under: Windows 11 x64 (build 22000)
── nflverse Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• nflreadr (1.3.2.11)    • nflseedR (1.2.0)  • nflplotR (1.1.0.9006)  
• nflfastR (4.5.1.9009)  • nfl4th   (1.0.4)  • nflverse (1.0.3)       
── nflverse Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• nflreadr.verbose: FALSE
── nflverse Dependencies ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• askpass     (1.1)     • hms        (1.1.3)    • proto        (1.0.0)    
• backports   (1.4.1)   • httr       (1.4.6)    • purrr        (1.0.1)    
• cachem      (1.0.8)   • isoband    (0.2.7)    • R6           (2.5.1)    
• cli         (3.6.1)   • janitor    (2.2.0)    • rappdirs     (0.3.3)    
• codetools   (0.2-19)  • jsonlite   (1.8.7)    • RColorBrewer (1.1-3)    
• colorspace  (2.1-0)   • labeling   (0.4.2)    • Rcpp         (1.0.11)   
• cpp11       (0.4.6)   • lattice    (0.21-8)   • rlang        (1.1.1)    
• crayon      (1.5.2)   • lifecycle  (1.0.3)    • rstudioapi   (0.15.0)   
• curl        (5.0.1)   • listenv    (0.9.0)    • scales       (1.2.1)    
• data.table  (1.14.8)  • lubridate  (1.9.2)    • snakecase    (0.11.0)   
• digest      (0.6.33)  • magick     (2.7.5)    • stringi      (1.7.12)   
• dplyr       (1.1.2)   • magrittr   (2.0.3)    • stringr      (1.5.0)    
• fansi       (1.0.4)   • MASS       (7.3-60)   • sys          (3.4.2)    
• farver      (2.1.1)   • Matrix     (1.6-0)    • tibble       (3.2.1)    
• fastmap     (1.1.1)   • memoise    (2.0.1)    • tidyr        (1.3.0)    
• fastrmodels (1.0.2)   • mgcv       (1.8-42)   • tidyselect   (1.2.0)    
• furrr       (0.3.1)   • mime       (0.12)     • timechange   (0.2.0)    
• future      (1.33.0)  • munsell    (0.5.0)    • utf8         (1.2.3)    
• generics    (0.1.3)   • nlme       (3.1-162)  • vctrs        (0.6.3)    
• ggplot2     (3.4.2)   • openssl    (2.1.0)    • viridisLite  (0.4.2)    
• globals     (0.16.2)  • parallelly (1.36.0)   • withr        (2.5.0)    
• glue        (1.6.2)   • pillar     (1.9.0)    • xgboost      (2.0.0.1)  
• gsubfn      (0.7)     • pkgconfig  (2.0.3)      
• gtable      (0.3.3)   • progressr  (0.13.0)     
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Screenshots

No response

Additional context

No response

tanho63 commented 1 year ago

Culprit seems to be the ifelse() used in %c%. could replace with vec_assign() + vec_detect_missing()? https://vctrs.r-lib.org/reference/vec_slice.html

mrcaseb commented 1 year ago

Indeed an ifelse problem with both lubridate dates and base dates. Going to fix now with data.table

library(nflreadr)
`%c%` <- nflreadr:::`%c%`
d1 <- lubridate::as_date(c("1998-05-01",
                           "1997-12-12",
                           NA_character_))
d2 <- lubridate::as_date(c("1995-01-31",
                           NA_character_,
                           "1997-12-12"))
d1 %c% d2
#> [1] 10347 10207 10207

d3 <- as.Date(c("1998-05-01",
                "1997-12-12",
                NA_character_))
d4 <- as.Date(c("1995-01-31",
                NA_character_,
                "1997-12-12"))
d3 %c% d4
#> [1] 10347 10207 10207

Created on 2023-09-05 with reprex v2.0.2