jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.23k stars 330 forks source link

bug: batting_stats not pulling all data (plus more) #193

Closed shravanramamurthy closed 3 years ago

shravanramamurthy commented 3 years ago

Hello, When using the batting_stats function, it doesn't seem to pull all data. For instance, batting_stats(2020) only returns a dataframe of length 142, where as running batting_stats_bref(2020) returns a dataframe of length 588. Furthermore, would it be possible to include the team that the player plays for in the playerid_lookup function? That table doesn't include accents on certain letters, making it extremely hard to reconcile pitch by pitch data to players and teams. Thanks!

wfordh commented 3 years ago

Hi @shravanramamurthy ! We recently updated how we pull from FanGraphs and it looks like the new default is for it to just pull qualified batters (seen here). Could you try specifying the qual parameter in your call and seeing what you get? If you set it to 1 (the previous default), then it should return a dataframe of length 581. I imagine the difference of 7 is due to partial seasons.

We will work on updating either the documentation to reflect the new default or the default itself.

Your second question about playerid_lookup() seems to be a separate issue entirely. Would you mind making another issue to address it?

timbroderick commented 3 years ago

FYI adding the qual argument to pitching_stats gets the complete download as well