jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.23k stars 330 forks source link

Home Column for team_game_logs seems to be flipped #236

Closed jzuhusky closed 2 years ago

jzuhusky commented 2 years ago

It seems that Home is returning the opposite value for what it should be.

E.g.

>>> pybaseball.team_game_logs(2021, "NYY", "pitching")
     Game    Date   Home  Opp    Rslt    IP   H   R  ER  UER  BB  SO  HR  ...  SB  CS  AB  2B  3B  IBB  SH  SF  ROE  GDP  NumPlayers           Umpire                                       PitchersUsed
0       1   Apr 1  False  TOR   L,2-3  10.0   8   3   2    1   2  13   1  ...   0   1  36   1   0    0   0   0    0    1           5     Mark Carlson  G.Cole (99-56), C.Green (99), J.Loaisiga (99),...
1       2   Apr 3  False  TOR   W,5-3   9.0   8   3   2    1   3  11   1  ...   2   0  33   0   0    0   0   0    0    2           5       James Hoye  C.Kluber (99-48), J.Loaisiga (1-W), L.Luetge (...
2       3   Apr 4  False  TOR   L,1-3   9.0   5   3   3    0   2   5   2  ...   0   0  33   1   0    0   0   0    1    0           2     Jordan Baker                    D.German (99-40-L), M.King (99)
3       4   Apr 5  False  BAL   W,7-0   9.0   4   0   0    0   2  13   0  ...   1   0  31   0   0    0   0   0    0    0           3     Sam Holbrook  J.Montgomery (99-71-W), L.Cessa (99), A.Chapma...
4       5   Apr 6  False  BAL   W,7-2   9.0   7   2   2    0   0  14   1  ...   1   0  34   2   0    0   0   0    0    0           3     Marty Foster         G.Cole (4-82-W), C.Green (2), L.Luetge (2)
..    ...     ...    ...  ...     ...   ...  ..  ..  ..  ...  ..  ..  ..  ...  ..  ..  ..  ..  ..  ...  ..  ..  ...  ...         ...              ...                                                ...
137   138   Sep 7  False  TOR   L,1-5   9.0   7   5   4    1   4   7   3  ...   0   0  31   1   0    0   0   2    0    1           5     Ryan Blakney  G.Cole (5-41-L), A.Abreu (1), J.Rodriguez (1),...
138   139   Sep 8  False  TOR   L,3-6   9.0   7   6   5    1  11  13   1  ...   1   0  32   0   1    0   0   1    0    0           7    Edwin Moscoso  L.Gil (21-45), L.Luetge (1), J.Rodriguez (0), ...
139   140   Sep 9  False  TOR   L,4-6   9.0  13   6   5    1   4   9   3  ...   1   0  39   3   0    0   0   0    0    1           4     Doug Eddings  N.Cortes (5-57), S.Romano (32-L), W.Peralta (0...
140   141  Sep 10   True  NYM  L,3-10   8.0  11  10   7    3   4  10   1  ...   0   1  33   2   0    0   0   1    0    0           4      Ted Barrett  J.Montgomery (5-25-L), J.Rodriguez (1), M.King...
141   142  Sep 11   True  NYM   W,8-7   9.0  11   7   7    0   6  11   2  ...   1   0  37   2   1    0   1   0    0    0           6  Angel Hernandez  C.Kluber (5-40), L.Luetge (2), C.Green (2-BSv)...

[142 rows x 33 columns]

Where I know for sure Sept 10th and Sept 11th the Mets/Yankees were playing at Citi Field (NYY is Away), when in the query I've asked for NYY data, not Mets.

I'm opening this issue for anyone to pickup if they want to (I might get to it if I have some time).

I would imagine the offending line is here: https://github.com/jldbc/pybaseball/blob/master/pybaseball/team_game_logs.py#L32

data["Home"] = ~data["Home"].isnull() # '@' if away, empty if home

An incorrect assumption about incoming data is likely being made.