panzarino / mlbgame

A Python API to retrieve and read MLB GameDay data
http://panz.io/mlbgame/
MIT License
526 stars 112 forks source link

Game Attendance Data Availability? #68

Closed msussman closed 6 years ago

msussman commented 6 years ago

Wanted to reach out to see if attendance data was available through the API

panzarino commented 6 years ago

@msussman I'll poke around and see what data MLB provides.

ajbowler commented 6 years ago

I'm unable to find anything in the XML files around attendance, though the venue name is widely available.

ajbowler commented 6 years ago

I'm digging around in their new stats API statsapi.mlb.com/docs but the lack of documentation isn't getting me very far.

I did find attendance in their live game endpoint, buried in an info property, here's a Dodgers @ Marlins game in 2016

 "info": [
        {
          "label": "Game Scores",
          "value": "Kershaw 41; Fernandez 84."
        },
        {
          "label": "HBP",
          "value": "Suzuki, I (by Fields, J)."
        },
        {
          "label": "Pitches-strikes",
          "value": "Kershaw 66-46; Coleman 10-6; Norris, B 15-7; Howell 1-1; Fields, J 9-3; Avilan 20-12; Ravin 17-11; Liberatore 7-5; Fernandez 102-66; Barraclough 14-11; Rodney 16-11; Ramos 11-8."
        },
        {
          "label": "Groundouts-flyouts",
          "value": "Kershaw 3-1; Coleman 1-1; Norris, B 1-1; Howell 0-0; Fields, J 0-0; Avilan 3-0; Ravin 1-0; Liberatore 0-2; Fernandez 4-1; Barraclough 2-0; Rodney 0-0; Ramos 0-1."
        },
        {
          "label": "Batters faced",
          "value": "Kershaw 14; Coleman 3; Norris, B 4; Howell 1; Fields, J 2; Avilan 5; Ravin 3; Liberatore 3; Fernandez 27; Barraclough 4; Rodney 3; Ramos 3."
        },
        {
          "label": "Inherited runners-scored",
          "value": "Howell 2-1; Fields, J 2-1; Avilan 3-0; Ramos 2-0."
        },
        {
          "label": "Umpires",
          "value": "HP: Brian Knight. 1B: Tony Randazzo. 2B: Bill Miller. 3B: Tom Woodring."
        },
        {
          "label": "Weather",
          "value": "77 degrees, roof closed."
        },
        {
          "label": "Wind",
          "value": "0 mph, None."
        },
        {
          "label": "First pitch",
          "value": "7:11 PM."
        },
        {
          "label": "T",
          "value": "3:08."
        },
        {
          "label": "Att",
          "value": "22,940."
        },
        {
          "label": "Venue",
          "value": "Marlins Park"
        },
        {
          "label": "September 9, 2016"
        }
      ],

It might be available through some smaller endpoints as this one is ENORMOUS, but would require some more research. As it is, I'm thinking this would be a good feature for mlbgame v3 (Stats API usage) since this info doesn't appear to be available in their XML files.

trevor-viljoen commented 6 years ago

It's available through rawboxscore.xml. Here's a random example: http://gd2.mlb.com/components/game/mlb/year_2017/month_04/day_16/gid_2017_04_16_milmlb_cinmlb_1/rawboxscore.xml

<boxscore wind="16 mph, R to L" game_type="R" venue_name="Great American Ball Park" attendance="12,625" home_sport_code="mlb" official_scorer="Mike Cameron" game_pk="490277" date="April 16, 2017" status_ind="F" home_league_id="104" elapsed_time="2:56" game_id="2017/04/16/milmlb-cinmlb-1" venue_id="2602" start_time="1:10 PM" weather="73 degrees, overcast" gameday_sw="P">
panzarino commented 6 years ago

@trevor-viljoen If you have time, could you go ahead and add that in a PR. I don't really have much time but I could get to it if you can't.

trevor-viljoen commented 6 years ago

@panzarino I'll try to find some time to do it this week. I'll also take a look at rawboxscore vs boxscore and see how different they are from each other. The fix might be as simple as using rawboxscore instead of boxscore.

Pertempto commented 6 years ago

Is anyone working on this? I've done some work with MLB's rawboxscore.xml and boxscore.xml in another project. I'd be interested in doing this.

ajbowler commented 6 years ago

Have at it, I could use another release soon.

Pertempto commented 6 years ago

Which class should the attendance attribute go in? Each of the classes is associated with a corresponding xml file in the MLB API: Overview goes with linescore.xml, GameBoxScore goes with boxscore.xml, and GameScoreboard goes with scoreboard.xml. I think attendance data would fit the best in the Overview class, where were you expecting it to go?

panzarino commented 6 years ago

@Pertempto I think that it would fit well with the other stats provided by the Overview class.

Pertempto commented 6 years ago

Great! I'm working on it now, and I'll probably have a pull request in the next few hours.

Pertempto commented 6 years ago

I've implemented the attendance feature, but I was wondering if I should add all the top level attributes from rawboxscore.xml to the game overviews. Here is an example. This would add useful data like the weather, wind, elapsed time, and exact start time. The only problem I found is that rawboxscore.xml includes a venue_name attribute, while the Overview class already has an venue attribute. Is it bad to have two attributes with the same value? Maybe I should add all the attributes from rawboxscore.xml and remove the venue_name attribute.

panzarino commented 6 years ago

@Pertempto It would be great if you could add that. It is fine to have duplicate attributes, just report everything.

Pertempto commented 6 years ago

@panzarino Where am I supposed to "report everything"? Is it as simple as adding the new attributes to the Overview docstring or do I need to document these new attributes somewhere else as well?

Pertempto commented 6 years ago

I've created a pull request with all the new attributes. #71

panzarino commented 6 years ago

Merged

msussman commented 6 years ago

Thanks all. When are you planning on releasing a new version?

On Sat, Mar 24, 2018, 3:47 PM Zach Panzarino notifications@github.com wrote:

Merged

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/panzarino/mlbgame/issues/68#issuecomment-375919863, or mute the thread https://github.com/notifications/unsubscribe-auth/AIEW6xH-GIbpdqnTpT0Qrg5JYbTn_eMQks5thqLfgaJpZM4Sa9QM .

ajbowler commented 6 years ago

I could definitely use a 2.5. Only missing piece from the LED board before the regular season starts is the probable starters which is just waiting for a new tag.

panzarino commented 6 years ago

I'll work on updating a few things and hopefully pushing out a new release in the coming days.

msussman commented 6 years ago

Sounds great! Looking forward to checking out the update.

On Sun, Mar 25, 2018 at 2:05 PM Zach Panzarino notifications@github.com wrote:

I'll work on updating a few things and hopefully pushing out a new release in the coming days.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/panzarino/mlbgame/issues/68#issuecomment-375990471, or mute the thread https://github.com/notifications/unsubscribe-auth/AIEW63td8NwGtqNfUTM77JNJIkNMn4Jwks5th9xmgaJpZM4Sa9QM .

panzarino commented 6 years ago

@msussman Sorry to disappoint, but I just looked at my schedule and I have almost no time this week so I'll have to push it back to sometime next week.

msussman commented 6 years ago

No worries. I won't need it in earnest until next month

On Mon, Mar 26, 2018, 3:46 PM Zach Panzarino notifications@github.com wrote:

@msussman https://github.com/msussman Sorry to disappoint, but I just looked at my schedule and I have almost no time this week so I'll have to push it back to sometime next week.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/panzarino/mlbgame/issues/68#issuecomment-376288698, or mute the thread https://github.com/notifications/unsubscribe-auth/AIEW67L835ZdkRcU5B8m2Vq5pc8PqsGWks5tiUWhgaJpZM4Sa9QM .

msussman commented 6 years ago

@panzarino, I just updated to the new release and am finding some issues with the attendance implementation.

  1. Mixed Data Times: When attendance is data is missing due to a double header, the attendance attribute is set to 0 (ex. Gameid 2011/07/02/pitmlb-wasmlb-1, 'attendance': 0), but when it's populated it's a string (ex. '2011/07/02/pitmlb-wasmlb-2', 'attendance': '39,636')

  2. Game Ids not found with Game_Overview method: I had pulled all games for the Nationals 2010-2018 previously without issue, but when I tried with the new release I'm getting quite a few games where the Game_Overview method returns this error "ValueError: Could not find a game with that id." Example - 2010_04_03_bosmlb_wasmlb_1

panzarino commented 6 years ago

@Pertempto could you look into this?

Pertempto commented 6 years ago

@panzarino just updated my local repo, hopefully I'll be able to look at it in next week.

Pertempto commented 6 years ago

Sorry, but I was busy last week and wasn't able to work on this. Someone else might want to do this because I can't promise that I'll have the time.