jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.23k stars 330 forks source link

Scrape MLB IDs from Baseball-Reference #222

Closed marek-slipski closed 3 years ago

marek-slipski commented 3 years ago

Context

Baseball-Reference player stat pages (like this) have player MLB IDs embedded in links. Scraping these IDs would be useful as batting_stats_range and pitching_stats_range return a Dataframe with Baseball-Reference player names as the only identifier but these are difficult to join on. This would grab the IDs from the links and add an mlbID column to the batting and pitching tables.

Example output of pitching_stats_range: image

schorrm commented 3 years ago

We can do this if you update the unit tests to match

marek-slipski commented 3 years ago

I think that should do it, but let me know if I'm missing something (I'm new at this).

schorrm commented 3 years ago

whoops, missed. can you merge the upstream testing changes back in and push and see if it passes the CI?