keberwein / mlbgameday

Multi-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
Other
41 stars 8 forks source link

2019 season get_payload() Error: Column `on_1b` must be length 4055 (the number of rows) or one, not 0 #13

Closed legopin closed 5 years ago

legopin commented 5 years ago

Expected Behavior

Expected to get df of pitch data for data on 2019-03-28 http://gd2.mlb.com/components/game/mlb/year_2019/month_03/day_28

df = get_payload(start = '2019-03-28', end = '2019-03-28')

Games listed here https://www.mlb.com/scores/2019-03-28

Current Behavior

Encounters error message, the download fails

Gathering Gameday data, please be patient...
Error: Column `on_1b` must be length 4055 (the number of rows) or one, not 0
In addition: Warning messages:
1: NAs introduced by coercion 
2: NAs introduced by coercion 
3: NAs introduced by coercion 
4: NAs introduced by coercion 
5: NAs introduced by coercion 
6: NAs introduced by coercion 

However, data download succeeds when the date is before 2018-10-28

df = get_payload(start = '2018-10-28', end = '2018-10-28')

Attempted Solution

Tried to reinstall the latest package to GitHub dev version Did not solve the issue

devtools::install_github("keberwein/mlbgameday", force = TRUE)

Context

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doParallel_1.0.14 iterators_1.0.10  foreach_1.4.4     mlbgameday_0.1.4  jsonlite_1.5     
 [6] stringi_1.4.3     RSQLite_2.1.1     DBI_1.0.0         dbplyr_1.2.2      dplyr_0.8.0.1    
[11] config_0.3       

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       rstudioapi_0.8   xml2_1.2.0       magrittr_1.5     tidyselect_0.2.5
 [6] bit_1.1-14       R6_2.4.0         rlang_0.3.1      stringr_1.4.0    blob_1.1.1      
[11] tools_3.5.1      yaml_2.2.0       bit64_0.9-7      assertthat_0.2.0 digest_0.6.17   
[16] tibble_2.1.1     crayon_1.3.4     tidyr_0.8.3      purrr_0.3.2      codetools_0.2-15
[21] curl_3.2         memoise_1.1.0    glue_1.3.1       compiler_3.5.1   pillar_1.3.1    
[26] pkgconfig_2.0.2 
Djgamelin commented 5 years ago

I get the same issue running the latest (I think) version. Just tried to pull yesterday's game data.

mogulman commented 5 years ago

Just for the record, I also have exactly the same experience as described above.

Djgamelin commented 5 years ago

Testing the function trying to retrieve 3/28/2019 data: --If I try running get_payload with dataset = "inning_all" I get the 'on_1b' error stated above. --If I try using dataset = "bis_boxscore" I get a "Error in as.numeric(t) : cannot coerce type 'closure' to vector of type 'double'" message. -- Dataset = "linescore" seems to work fine.

legopin commented 5 years ago

I’m suspecting that this changelog entry for 2018/9/20 on baseballsavant might be related to this issue

Resolved bug with null pos_personid's. > Fields re-named to fielder to resolve it. https://baseballsavant.mlb.com/change-log

keberwein commented 5 years ago

This bug has been resolved and a new version (0.2.0) has been pushed to CRAN. There were several schema changes to the 2019 data set. I have reconciled the schema changes so 2019 data will fit with older data. Changes are below.

https://cran.r-project.org/web/packages/mlbgameday/index.html

Much appreciation for the timely bug reports!

berkeley44 commented 5 years ago

This bug has been resolved and a new version (0.2.0) has been pushed to CRAN. There were several schema changes to the 2019 data set. I have reconciled the schema changes so 2019 data will fit with older data. Changes are below.

  • score was removed from 2019 inning_all$atbat data. The column was inserted and cast as NA for 2019.
  • on_1, on_2, and on_3 were removed from 2019 inning_all$atbat. The columns were inserted and cast to NA.

https://cran.r-project.org/web/packages/mlbgameday/index.html

Much appreciation for the timely bug reports!

Thanks for the update!