Closed bentrevett closed 5 years ago
Great idea. I'd suggest also making it specific to the current version - so when the new Lahman comes out it'll download the new even if old is there
I like it @bentrevett. Want to submit a PR?
Also +1 to @schorrm's comment. Is the naming convention consistent year over year? If so, it would be an improvement to use some safe rule for when we know it will update to the next season's version and increment the year used in the url.
Yep, I'll do it tonight.
I'm not sure the Lahman DB has consistent naming.
From http://www.seanlahman.com/baseball-archive/statistics/ I can see that the 2015 and 2016 versions are named http://seanlahman.com/files/database/baseballdatabank-master_2016-02-29.zip and http://seanlahman.com/files/database/baseballdatabank-master_2016-03-02.zip.
Seems to just be the date they were uploaded (?).
One possible solution is to instead get the data from https://github.com/chadwickbureau/baseballdatabank. This seems to be in the same format as the Lahman DB and is frequently updated.
What if we have the current version as a string and when Lahman updates just push the new version to a package update? A bit clunky, but barring better version management there...
That's what it's currently doing, I think hardcoding it is fine. I'd rather this stay up and be out of date for a while before we manually update the url than try to guess next year's naming convention and break it.
Really like this library, but one thing I don't get is why the Lahman DB needs to be re-downloaded every time you try and use a function interfacing with the Lahman DB.
I'm proposing something like this:
And then all Lahman interfacing functions can be edited like so:
This way you only have to call
download_lahman
once and every subsequent time you callparks()
it will just use the downloaded DB.This probably isn't the most elegant way to do it, but I think something like this would be a good idea.
Happy to discuss, do the changes myself and file a pull request!