keberwein / mlbgameday

Multi-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
Other
41 stars 8 forks source link

Updated GIDs / Data refresh #2

Closed maximize22 closed 6 years ago

maximize22 commented 6 years ago

Hi, thanks for pulling this together. Just curious, do you have code that pulls the updated gids? Looks like you load them into a data file. 2018 data (shells) are out on the mlb gameday site. Neither pitchrx or your code has the game listing pulled in. Was wondering if you had that before I build code to pull them in myself. Thanks!

keberwein commented 6 years ago

The code that pulls the updated gids is here. I probably won't update to 2018 until after the season for a couple of reasons. 1) The package will work even if the gids aren't in the internal database, it can just grab them "on the fly" from the miniscoreboard. It's a bit slower, but it works just the same. 2) The gids in the internal dataset are all valid games, i.e. they were all played. Some of the gids in the 2018 shell will be rainouts or delays, and I don't want to introduce potential 404 urls into the internal dataset.

Carson's package works a bit differently. He pulls the shells and does a tryCatch on every url. I decided against this because it slowed things down. Not sure if he's going to update this year or not, he hasn't made a lot of commits lately.

In any event, this package will work with new games even if the gids aren't in the internal database. If you want to update your own internal database, the linked code will do it, but there's no real real reason to do so.