jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.25k stars 333 forks source link

Fix the Lahman Database Scraping #434

Open jmaslek opened 3 months ago

jmaslek commented 3 months ago

This PR redirects the Lahman database from a 404 github link to the dropbox site that is found on baseball1.com.

In order to extract the data, py7zr was added to the requirements.

schorrm commented 3 months ago

couldl this be addressed by moving the link to soemthing in https://github.com/chadwickbureau/retrosheet?

jmaslek commented 3 months ago

couldl this be addressed by moving the link to soemthing in https://github.com/chadwickbureau/retrosheet?

Looks like there may be some overlapping files, but nothing that mimics Lahman's db.

bdilday commented 3 months ago

what do y'all think about extracting the data and posting it to a repo in github? maybe even embedded in pybaseball?

jmaslek commented 3 months ago

what do y'all think about extracting the data and posting it to a repo in github? maybe even embedded in pybaseball?

I have no issue with that (I assume theres no licensing issues with that). I'm happy to add a folder here or put them on my own github.