keberwein / mlbgameday

Multi-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
Other
41 stars 8 forks source link

Linescore Dataset error #7

Closed jestarr closed 6 years ago

jestarr commented 6 years ago

I'm getting the following error below.

innings_df <- get_payload(start = "2017-01-01", end = "2018-01-01", dataset = "linescore", db_con = con) Gathering Gameday data, please be patient... Processing data chunk 1 of 7 Processing data chunk 2 of 7 Error: Column name mismatch.

keberwein commented 6 years ago

This issue was due to some bad data logic, exposed by pre-season exhibition games. Logic has been re-mapped and the issue seems to be fixed in the development version.https://github.com/keberwein/mlbgameday/commit/7a90325d187258a4a94358766bfb8d525f1203aa

jestarr commented 6 years ago

I updated to the new version and tried it last night and it still isn't working.

keberwein commented 6 years ago

Still chasing this bug down. In the meantime, you can pull the linescore object in as a dataframe and simply write that dataframe to your database. The bug lies somewhere in my "data chunking" logic, but I don't see the error when the db_con argument isn't called. Try the following for now, and I'll work on the bug.

library(doParallel)
library(DBI)
library(RSQLite)

con <- dbConnect(RSQLite::SQLite(), dbname = "gameday.sqlite3")

no_cores <- detectCores() - 2
cl <- makeCluster(no_cores)  
registerDoParallel(cl)

linescore <- get_payload(start = "2017-01-01", end = "2018-01-01", dataset = "linescore")

stopImplicitCluster()
rm(cl)

# Use a loop to write all tables to a the database.
for (i in names(linescore)) DBI::dbWriteTable(conn = con, value = linescore[[i]], name = i, append = TRUE)
# Remove and garbage collect.
rm(linescore)
gc()
jestarr commented 6 years ago

This Kris. Since some of the data tables were written, how do I delete those linescore tables?

keberwein commented 6 years ago

You could probably just truncate the entire tables with DBI::dbRemoveTable(con, "game") and DBI::dbRemoveTAble(con, "game_media). After the tables are truncated, try your write again.

keberwein commented 6 years ago

OK, this issue has been fixed in the latest development version 0.1.1. You can do a GitHub install if you want a quick fix.

The issue was caused by inconsistent column ordering coming from the XML documents. We also saw this issue in the inning_all data set.

This will be the next CRAN update. However, I need to check the other data sets before making the push.

keberwein commented 6 years ago

This issue was fixed with the latest CRAN release 0.1.1.