This repository — a growing work in progress — feeds Dodgers Data Bot, a statistical dashboard about the LA Dodgers' performance.
The code executes an automated workflow to fetch, process and store the team's current standings along with historical game-by-game records dating back to 1958. It also collects batting and pitching data, among other statistics, for the same period. These records are processed and used to bake out the site using the Jekyll static site generator, in concert with Github Pages, and D3.js for charts.
The data is sourced from the heroes at Baseball Reference and consolidated into unified datasets for analysis and visualization purposes only. The resulting site is a non-commercial hobby project.
The repository includes numerous Python scripts that perform the following daily operations for team standings, pitching and batting, by season, including:
scripts/01_fetch_process_standings.py
scripts/02_fetch_process_batting.py
scripts/03_viz_standings.py
scripts/04_viz_batting.py
scripts/05_fetch_process_pitching.py
scripts/06-create-toplines-summary.py
scripts/07_fetch_process_season_outcomes.py
scripts/08_fetch_process_wins_losses_current.py
scripts/09_fetch_process_historic_batting_gamelogs.py
scripts/10_fetch_process_attendance.py
11_fetch_process_historic_pitching_gamelogs.py
visuals
directory.The repository uses GitHub Actions to automate the execution of the scripts each day, ensuring the datasets remains up-to-date throughout the baseball season. The workflow includes the following steps:
To utilize this repository for your own tracking or analysis on the Dodgers or another team, follow these steps:
AWS_ACCESS_KEY_ID
: Your AWS Access Key ID.AWS_SECRET_ACCESS_KEY
: Your AWS Secret Access Key.The processed datasets — which aren't all documented below yet — are uploaded to an AWS S3 bucket.
Latest season summary
Data structure: Each row represents a statistic for the latest point in the season
Stat | Value | Category |
---|---|---|
wins | 15 | standings |
losses | 11 | standings |
record | 15-11 | standings |
win_pct | 57% | standings |
win_pct_decade_thispoint | 57% | standings |
runs | 139 | standings |
runs_against | 112 | standings |
run_differential | 27 | standings |
home_runs | 30 | batting |
home_runs_game | 1.15 | batting |
home_runs_game_last | 1.54 | batting |
home_runs_game_decade | 1.36 | batting |
stolen_bases | 16 | batting |
stolen_bases_game | 0.62 | batting |
stolen_bases_decade_game | 0.49 | batting |
batting_average | .268 | batting |
batting_average_decade | .253 | batting |
summary | The Dodgers have played 26 games this season compiling a 15-11 record — a winning percentage of 57%. The team's last game was an 11-2 away win to the WSN in front of 26,298 fans. The team has won 5 of its last 10 games. | standings |
Game-by-game standings, 1958 to present (10,400+ rows):
Data structure: Each row represents a game in a specific season
column_name | column_type | column_description |
---|---|---|
gm |
int64 | Game number of season |
game_date |
datetime64[ns] | Game date (%Y-%m-%d) |
home_away |
object | Game location ("home" vs. "away") |
opp |
object | Three-digit opponent abbreviation |
result |
object | Dodgers result ("W" vs. "L") |
r |
int64 | Dodgers runs scored |
ra |
int64 | Runs allowed by Dodgers |
record |
object | Dodgers season record after game |
wins |
int64 | Dodgers wins after game |
losses |
int64 | Dodgers losses after game |
win_pct |
float64 | Dodgers season record after game |
rank |
object | Rank in division* |
gb |
float64 | Games back in division* |
time |
object | Game length |
time_minutes |
int64 | Game length, in minutes |
day_night |
object | Start time: "D" vs. "N" |
attendance |
int64 | Home team attendance |
year |
object | Season year |
* Before divisional reorganization in the National League in 1969, these figures represented league standings.
Season-by-season batting statistics, by player, 1958 to present:
Data structure: Each row represents a player in a specific season
column_name | column_type | column_description |
---|---|---|
rk |
object | Rank order at output |
pos |
object | Position |
name |
object | Player name |
age |
object | Player age on June 30 |
g |
int64 | Game appearances |
pa |
int64 | Plate appearances* |
ab |
int64 | At-bats* |
r |
int64 | Runs scored |
h |
int64 | Hits |
2b |
int64 | Doubles |
3b |
int64 | Triples |
hr |
int64 | Home runs |
rbi |
int64 | Runs batted in |
sb |
int64 | Stolen bases |
cs |
int64 | Caught stealing |
bb |
int64 | Walks |
so |
int64 | Strikeouts |
ba |
float64 | Batting average |
obp |
float64 | On-base percentage |
slg |
float64 | Slugging percentage |
ops |
float64 | OPB + SLG |
ops_plus |
float64 | OPS adjusted to player's home park |
tb |
int64 | Total bases |
gdp |
int64 | Double plays grounded into |
hbp |
int64 | Hit by pitch |
sh |
int64 | Sacrifice hits |
sf |
int64 | Sacrifice flies |
ibb |
int64 | Intentional walks |
season |
object | Season |
bats |
object | Player's batting side (right, left, unknown) |
* An at-bat is when a player reaches base via a fielder's choice, hit or an error — not including catcher's interference — or when a batter is put out on a non-sacrifice. A plate appearance refers to each completed turn batting, regardless of the result.
Other current season player batting statistics:
Season-by-season batting at the team level, 1958 to present:
Data structure coming soon
Data structure coming soon
Current season pitching:
Data structure coming soon
Data structure coming soon
This project, which started as a few scrapers, has grown into a detailed project and outgrown its original documentation. More to come soon. If you have questions in the meantime, please let me know.
Contributions, suggestions and enhancements are welcome! Please open an issue or submit a pull request if you have suggestions for improvement.
This project is open-sourced under the MIT License. See the LICENSE file for more details.