jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.18k stars 323 forks source link

Change assumed end of season date from November 1 to November 30 #323

Closed TK2575 closed 1 year ago

TK2575 commented 1 year ago

While introducing/familiarizing myself with this repo I noticed a few places where the assumed end of season was no later than November 1st. We've had a few World Series in recent memory end a few days after the 1st. I figured adjusting this to end of month would be a good first issue opportunity.

tjburch commented 1 year ago

Thanks for the contribution. LGTM

I had a worry that the scraped page was only regular season stats, so I checked with and without for 2022 and it is, in fact, including postseason so we should extend it to end-of-November:

>>> df.query("Name == 'Yordan Alvarez'")
              Name  Age  #days     Lev       Tm    G   PA   AB    R    H  2B  3B  HR  RBI  BB  IBB   SO  HBP  SH  SF  GDP  SB  CS     BA    OBP    SLG    OPS   mlbID
27  Yordan Alvarez   25    110  Maj-AL  Houston  145  609  510  102  152  32   2  39  106  84   11  118    8   0   7   12   1   1  0.298  0.401  0.598  0.999  670541
>>> df2.query("Name == 'Yordan Alvarez'")
              Name  Age  #days     Lev       Tm    G   PA   AB    R    H  2B  3B  HR  RBI  BB  IBB   SO  HBP  SH  SF  GDP  SB  CS     BA    OBP    SLG    OPS   mlbID
27  Yordan Alvarez   25    106  Maj-AL  Houston  148  622  522  104  154  32   2  40  111  84   11  122    9   0   7   12   1   1  0.295  0.397  0.594  0.991  670541

Scraping the following pages:

https://www.baseball-reference.com/leagues/daily.fcgi?user_team=&bust_cache=&type=b&lastndays=7&dates=fromandto&fromandto=2022-03-01.2022-11-01&level=mlb&franch=&stat=&stat_value=0

https://www.baseball-reference.com/leagues/daily.fcgi?user_team=&bust_cache=&type=b&lastndays=7&dates=fromandto&fromandto=2022-03-01.2022-11-30&level=mlb&franch=&stat=&stat_value=0