jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.26k stars 333 forks source link

pybaseball.pitching_stats() returns batting stats, not pitching stats #267

Closed armstjc closed 2 years ago

armstjc commented 2 years ago

When running the following code:

import pybaseball
import pandas as pd

def getFanGraphsBattingStats(start=2020, end=2021):
    for i in range(start,end+1):
        data = pybaseball.batting_stats(i)
        data.to_csv(f'Data/FanGraphs/Batting/{i}_fangraphs_batting.csv',index=False)
        print(data)

def getFanGraphsPitchingStats(start=2020, end=2021):
    for i in range(start,end+1):
        data = pybaseball.pitching_stats(i)
        data.to_csv(f'Data/FanGraphs/Pitching/{i}_fangraphs_pitching.csv',index=False)
        print(data)

if __name__ == "__main__":
    pybaseball.cache.enable()
    getFanGraphsBattingStats()
    getFanGraphsPitchingStats()

It does not get the pitching stats as intended, as shown below:


      IDfg  Season                Name   Team  ...   CSW%    xBA   xSLG  xwOBA
18   19709    2020  Fernando Tatis Jr.    SDP  ...  0.269  0.297  0.614  0.419
1     5361    2020     Freddie Freeman    ATL  ...  0.191  0.341  0.660  0.464
4    13510    2020        Jose Ramirez    CLE  ...  0.229  0.263  0.505  0.371
9    15676    2020          Jose Abreu    CHW  ...  0.293  0.299  0.587  0.398
20   13611    2020        Mookie Betts    LAD  ...  0.272  0.281  0.481  0.359
..     ...     ...                 ...    ...  ...    ...    ...    ...    ...
130  13145    2020           Josh Bell    PIT  ...  0.284  0.228  0.381  0.297
139   6153    2020     Eduardo Escobar    ARI  ...  0.260  0.261  0.394  0.305
117   3892    2020        Josh Reddick    HOU  ...  0.260  0.245  0.358  0.300
128   6184    2020       J.D. Martinez    BOS  ...  0.263  0.229  0.444  0.316
136  10071    2020     Jonathan Villar  - - -  ...  0.266  0.211  0.281  0.256

[142 rows x 319 columns]
      IDfg  Season                Name   Team  ...   CSW%    xBA   xSLG  xwOBA
3    19709    2021  Fernando Tatis Jr.    SDP  ...  0.270  0.279  0.618  0.406
1    20123    2021           Juan Soto    WSN  ...  0.263  0.304  0.544  0.430
8    16252    2021         Trea Turner  - - -  ...  0.262  0.303  0.484  0.362
0    11579    2021        Bryce Harper    PHI  ...  0.263  0.301  0.610  0.430
20   13510    2021        Jose Ramirez    CLE  ...  0.233  0.281  0.505  0.374
..     ...     ...                 ...    ...  ...    ...    ...    ...    ...
123  10243    2021      Randal Grichuk    TOR  ...  0.249  0.233  0.402  0.294
95   14221    2021         Jorge Soler  - - -  ...  0.269  0.249  0.493  0.354
125   2396    2021      Carlos Santana    KCR  ...  0.242  0.244  0.421  0.334
118   1744    2021      Miguel Cabrera    DET  ...  0.274  0.231  0.415  0.313
126  15117    2021       Hunter Dozier    KCR  ...  0.302  0.224  0.388  0.299

[132 rows x 319 columns]
      IDfg  Season                Name   Team  ...   CSW%    xBA   xSLG  xwOBA
18   19709    2020  Fernando Tatis Jr.    SDP  ...  0.269  0.297  0.614  0.419
1     5361    2020     Freddie Freeman    ATL  ...  0.191  0.341  0.660  0.464
4    13510    2020        Jose Ramirez    CLE  ...  0.229  0.263  0.505  0.371
9    15676    2020          Jose Abreu    CHW  ...  0.293  0.299  0.587  0.398
20   13611    2020        Mookie Betts    LAD  ...  0.272  0.281  0.481  0.359
..     ...     ...                 ...    ...  ...    ...    ...    ...    ...
130  13145    2020           Josh Bell    PIT  ...  0.284  0.228  0.381  0.297
139   6153    2020     Eduardo Escobar    ARI  ...  0.260  0.261  0.394  0.305
117   3892    2020        Josh Reddick    HOU  ...  0.260  0.245  0.358  0.300
128   6184    2020       J.D. Martinez    BOS  ...  0.263  0.229  0.444  0.316
136  10071    2020     Jonathan Villar  - - -  ...  0.266  0.211  0.281  0.256

[142 rows x 319 columns]
      IDfg  Season                Name   Team  ...   CSW%    xBA   xSLG  xwOBA
3    19709    2021  Fernando Tatis Jr.    SDP  ...  0.270  0.279  0.618  0.406
1    20123    2021           Juan Soto    WSN  ...  0.263  0.304  0.544  0.430
8    16252    2021         Trea Turner  - - -  ...  0.262  0.303  0.484  0.362
0    11579    2021        Bryce Harper    PHI  ...  0.263  0.301  0.610  0.430
20   13510    2021        Jose Ramirez    CLE  ...  0.233  0.281  0.505  0.374
..     ...     ...                 ...    ...  ...    ...    ...    ...    ...
123  10243    2021      Randal Grichuk    TOR  ...  0.249  0.233  0.402  0.294
95   14221    2021         Jorge Soler  - - -  ...  0.269  0.249  0.493  0.354
125   2396    2021      Carlos Santana    KCR  ...  0.242  0.244  0.421  0.334
118   1744    2021      Miguel Cabrera    DET  ...  0.274  0.231  0.415  0.313
126  15117    2021       Hunter Dozier    KCR  ...  0.302  0.224  0.388  0.299

[132 rows x 319 columns]
The thread 'MainThread' (0x1) has exited with code 0 (0x0).
The program 'python.exe' has exited with code 4294967295 (0xffffffff).
tjburch commented 2 years ago

This is effectively a duplicate of #221. I was unable to reproduce, but seems like others are running into it as well. @schorrm - any thoughts here?

bdilday commented 2 years ago

I took a look at it and I was able to reproduce it (using pybaseball.cache.enable() as shown in the example above).

It seems to me that the caching is unable to distinguish between the two calls. I speculate that it has to do with they're both calling the fetch method https://github.com/jldbc/pybaseball/blob/master/pybaseball/datasources/fangraphs.py#L224-L226

of classes that are derived from the FangraphsDataTable (abstract) class https://github.com/jldbc/pybaseball/blob/master/pybaseball/datasources/fangraphs.py#L76-L81

I have 2 guesses of what might fix it:

but honestly I don't have a strong understanding of the caching so I don;t really know.