infochimps-customers / hackboxen-community

Infochimps' Public Hackboxen -- Archiving the World's Data Together
http://www.infochimps.com
4 stars 4 forks source link

Pennant Data Ouput Format #3

Open kornypoet opened 13 years ago

kornypoet commented 13 years ago
team_stat_record =
{
'team_id'             => 'TEX',
'team_name'           => 'Rangers,
'team_location'       => 'Texas',
'primary_color'       => 'Navy Blue',
'secondary_color'     => 'Red',
'primary_color_hex'   => '0x04307E',
'secondary_color_hex' => '0xB9161C',
'home_stadium'        => 'Ameriquest Field in Arlington',
'league'              => 'AL',
'division'            => 'W'
'seasons'             => [
  {
    'season'            => 1972,
    'wins'              => 54,
    'losses'            => 100,
    'division_series'   => 0,   
    'league_rank'       => 4,
    'division_rank'     => 1
  },
  ...
  ...
  ...
  ]
}

season_stat_record =
{
'season'               => 2010,
'team_id'              => 'TEX',
'team_name'            => 'Rangers',
'team_location'        => 'Texas',
'primary_color'        => 'Navy Blue',
'secondary_color'      => 'Red',
'primary_color_hex'    => '0x04307E',
'secondary_color_hex'  => '0xB9161C',
'home_stadium'         => 'Ameriquest Field in Arlington',
'league'               => 'AL',
'division'             => 'W',
'league_rank'          => 4,
'division_rank'        => 1,
'AB'                   => 746,
'BA'                   => 0.298,
'E'                    => 56,
...
...
...
'relative_AB'          => 0.460,
'relative_BA'          => 0.672,
'relative_E'           => 0.123,
...
...
...
'regular_season_games' => [
  {
  'game_date'            => 20100531,
  'game_type'            => 'away',
  'opponent'             => 'ANA',
  'home_score'           => 1,
  'away_score'           => 3,
  'game_result'          => 'L'
  'double_header'        => 'false'
  },
  ...
  ...
  ...
  ]
'post_season_games'    => [
  ...
  ...
  ...
  ]
}

game_stat_record =
{
'game_id'         => 'TEX201005310',
'date'            => 20100531,
'day_of_the_week' => 'Saturday',
'start_time'      => '7:19 PM EST',
'attendance'      => 54322,
'stadium'         => 'Ameriquest Field in Arlington',
'location'        => 'Arlington, TX'
'home_team_stats' =>
  {
  'team_id'         => 'TEX',
  'team_name'       => 'Rangers',
  'team_location'   => 'Texas',
  'league_rank'     => 4,
  'division_rank'   => 1,
  'runs'            => 1,
  'hits'            => 12,
  'errors'          => 3,
  'left_on_base'    => 6,
  'pitching'        => [
    {
    'player_id'       => 'cramb001',
    'first_name'      => 'Bobby',
    'last_name'       => 'Cramer',
    'batters_faced'   => 0,
    'hits'            => 0,
    'walks'           => 0,
    'rbi'             => 0
    },
    ...
    ...
    ...
    }
    ],
  'batting'         => [
    {
     'player_id'     => 'davir003',
     'first_name'    => 'Rajai',
     'last_name'     => 'Davis',
     'at_bats'       => 5,
     'hits'          => 2,
     'walks'         => 0,
     'rbi'           => 1
    },
    ...
    ...
    ...
    ],
  },
'away_team_stats' => {
  ...
  ...
  ...
  },
'innings'         => [
  {
  'number'            => 1,
  'home_play_by_play' => [
    {
    'player_id'         => 'davir003',
    'pitcher_id'        => 'cramb001',
    'balls'             => 1,
    'strikes'           => 0,
    'result'            => 'Generic Out',
    'description'       => 'Rajai Davis hit into an out.',
    'home_score'        => 1,
    'away_score'        => 0
    },
    ...
    ...
    ...
    ],
  'away_play_by_play' => [
    ...
    ...
    ...
    ],
  },
  ...
  ...
  ...
  ]
}
kornypoet commented 13 years ago

There has been some discussion of splitting the game_stats into two different API calls. The first being pretty general, the second being the innings and play-by-play. I will post more as the discussion develops.

kornypoet commented 13 years ago

@vaniver

vaniver commented 13 years ago

home_stadium can change from season to season. In the team_record, we only want the location of the last season's home stadium, without any information on what the stadium was previously called?

kornypoet commented 13 years ago

Yeah, there was some issues with this, but I felt that the year by year record was getting too big, so I pulled all of the information back to the top level. What there needs to be is a note in the documentation saying that the colors, leagues, divisions, and stadiums are all from current data, and that the changing data will still be reflected in the season API.

vaniver commented 13 years ago

If we're pulling the game_id out of the season_record object, are we also pulling it out of the season_stats_catalog_entry description?

kornypoet commented 13 years ago

Yeah, there might need to be a little description cleanup for each of the Icss.

vaniver commented 13 years ago

The pitching player stats are broken. I've checked all of the game files and none of them have a battersFaced that isn't 0. Should we drop all four int fields from the pitcher record?

Are all of the start times in the season xmls PM EST times?

kornypoet commented 13 years ago

That sucks about the pitching stats. Keep it in there for now, but I will raise the issue. You can double check the time zone by comparing the start time to espn's official start times(which are in EST).

vaniver commented 13 years ago

The times appear to be local start times. My current solution is to just turn "207" into "2:07 PM"; prefer something else?

vaniver commented 13 years ago

Clarification: the league and division ranks stored in the game_stat_record should be as of the date of that game, not the final ranking for that year. [Y/n]

kornypoet commented 13 years ago

Yes to the time question, if you can't determine the time zone, just drop it. The game_stat_record should have the rankings as of that date, and the season_stat_record should have the final(latest) ranking for the season. The dates seem to line up perfectly in the rankings files.

vaniver commented 13 years ago

Well, it would be possible to have a hash of locations to time zones and figure out what to report the time as, but I'd much rather drop it.

The dates are almost perfect- postseason games aren't included. I just used the last season rankings. The engine is almost finished, and going through final testing now.