Open jacobakaye opened 4 months ago
That should be doable. There’s a lot of different options and I think it would be hard to code for all of them. I think what I will do soon in order is
It’s easier to work with a concrete goal in mind so maybe we can start by working toward the game preview table like that college basketball one you sent.
On Mon, Jun 10, 2024 at 6:54 PM jacobakaye @.***> wrote:
maybe like more splits? starters era? relievers era? stuff like that? what do you think?
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2159569051, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBKOIQ6J5LJNZDJXJZ3ZGZDEDAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJZGU3DSMBVGE . You are receiving this because you commented.Message ID: @.***>
Got it. I think you're right, and you know more than me. I thought about it, and for a table like this, the simpler the better.
How about this...
AVG OPS R HR SB ERA WHIP BB/9 K/9 H/9 HR/9
What do you think? I'm pretty sure all of these are available via baseballr:
team_hitting_leaders <- (fg_team_batter(qual = "y", startseason = 2024, endseason = 2024))
team_pitching_leaders <- (fg_team_pitcher(qual = "y", startseason = 2024, endseason = 2024))
That's awesome. That code is great for getting a ton of data. I'll work on it now.
On Tue, Jun 11, 2024 at 8:08 AM jacobakaye @.***> wrote:
Got it. I think you're right, and you know more than me. I thought about it, and for a table like this, the simpler the better.
How about this...
AVG OPS R HR SB ERA WHIP BB/9 K/9 H/9 HR/9
What do you think? I'm pretty sure all of these are available via baseballr:
team_hitting_leaders <- (fg_team_batter(qual = "y", startseason = 2024, endseason = 2024)) team_pitching_leaders <- (fg_team_pitcher(qual = "y", startseason = 2024, endseason = 2024))
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2160862777, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBL3JWBQ7QDEBXAB5F3ZG4AFPAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQHA3DENZXG4 . You are receiving this because you commented.Message ID: @.***>
Cool. Thank you. I'll do my best to get the table done within a few days. It should come out very cool!
So I made the dataset. The code to access it is in collab.R on my github. It combines bbref, fangraphs and baseball savant pitching and hitting data. If it's a pitching stat then there is a _p at the end of the column name. And then at the bottom there's a function to pick two teams and then as many stats as you want. The list you gave me is an example. Let me know what I can improve on...
If you want to see all your stat choices, do colnames(final_dset) after running lines 1-192.
[image: image.png]
On Tue, Jun 11, 2024 at 9:37 AM jacobakaye @.***> wrote:
Cool. Thank you. I'll do my best to get the table done within a few days. It should come out very cool!
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2161068836, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBIJAGSPAZTRDEH6TEDZG4KSHAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGA3DQOBTGY . You are receiving this because you commented.Message ID: @.***>
What do you think about this? I would like to change the cropping but I cant figure it out right now.
That's awesome. It looks really professional. Do you want to share your code and maybe I can check out the cropping ( what specifically about the cropping though)?
On Tue, Jun 11, 2024 at 2:08 PM jacobakaye @.***> wrote:
What do you think about this? I would like to change the cropping but I cant figure it out right now. mlb_game_preview.png (view on web) https://github.com/kndunlap/baseball-web-scrape/assets/126515518/8dad629a-d6b9-4094-acac-6b8853bf8b27
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2161524576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBIA5E44SIO7K56UAXTZG5KL5AVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGUZDINJXGY . You are receiving this because you commented.Message ID: @.***>
Yes. I just would like it more vertically if that makes sense. It looks very stretched out. Feel free to make whatever changes you want.
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(rvest)
library(baseballr)
library(dplyr)
library(tidyr)
library(gt)
library(rlang)
library(gtExtras)
library(cbbplotR)
# HITTING -----------------------------------------------------------------------------------------------
# Read in Baseball Savant -------------------------------------------------
savant_url <- "https://baseballsavant.mlb.com/league"
html_savant <- read_html(savant_url)
savant_tables <- html_savant |>
html_table()
# Read in and Clean bbref -------------------------------------------------
bbref_url <- "https://www.baseball-reference.com/leagues/majors/2024-standard-batting.shtml"
html_bbref <- read_html(bbref_url)
bbref_tables <- html_bbref |>
html_table()
bbref <- bbref_tables[[1]]
bbref <- bbref |>
slice(-31:-33)
# Clean Savant ---------------------------------------------------
stable1 <- savant_tables[[1]]
colnames(stable1) <- stable1[1,]
stable1 <- stable1 |> slice(-1)
stable1 <- stable1 |> slice(-31)
stable1
stable2 <- savant_tables[[2]]
stable2 <- stable2 |> slice(-31)
stable2
stable3 <- savant_tables[[3]]
stable3 <- stable3 |> slice(-31)
stable3
savant_table <- cbind(stable1, stable2, stable3)
savant_table <- savant_table[, !duplicated(names(savant_table))]
savant_table <- as.tibble(savant_table)
savant_table <- savant_table |> select(!27)
savant_table <- savant_table |> select(!1)
bbref <- bbref %>%
mutate(across(c(PA, AB, H, HR, BB, SO, `2B`, `3B`), ~as.numeric(gsub(",", "", .))))
savant_table <- savant_table %>%
mutate(across(c(PA, AB, H, HR, BB, SO, `2B`, `3B`), ~as.numeric(gsub(",", "", .))))
# Join and Clean Up ----------------------------------------------------------------
combined <- savant_table |>
full_join(bbref, join_by(AB, PA, H, HR, BB, SO, `2B`, `3B`))
dataset <- combined |>
relocate(Tm) |>
select(-c(Season, BA.y, OBP.y, SLG.y)) |>
rename(
BA = BA.x,
SLG = SLG.x,
OBP = OBP.x,
Team = Tm
)
dataset_final <- dataset |>
mutate(across(c(2:67), ~as.numeric(gsub(",", "", .))))
codes <- c("ATL", "CHW", "MIA", "WSN", "CLE", "LAA", "STL", "TOR", "MIN", "DET", "CIN", "BAL", "TBR", "COL", "SEA",
"TEX", "KCR", "NYM", "OAK", "SFG", "CHC", "PHI", "PIT", "MIL", "HOU", "ARI", "BOS", "NYY", "SDP", "LAD")
dataset2 <- dataset_final |>
mutate(team_name = codes) |>
relocate(team_name)
# Tack on and Clean Fangraphs ----------------------------------------------
fg_hitting <- (fg_team_batter(qual = "y", startseason = 2024, endseason = 2024))
fg_hitting <- fg_hitting |>
mutate(across(c(PA, AB, H, HR, BB, SO, `2B`, `3B`), ~as.numeric(gsub(",", "", .))))
fg_dataset <- dataset2 |>
full_join(fg_hitting, by = c("PA", "AB", "H", "HR", "BB", "SO")) |>
relocate(team_name.y) |>
rename(team_name = team_name.y) |>
rename(team_name.y = team_name.x) |>
select(-team_name.y)
hitter_final <- fg_dataset |>
select(!ends_with(".y")) |>
rename_with(~ str_remove(., "\\.x$"), ends_with(".x"))
# PITCHING -------------------------------------------------------------------------------------------
# Read in and Clean bbref -------------------------------------------------
bbref_url_p <- "https://www.baseball-reference.com/leagues/majors/2024-standard-pitching.shtml"
html_bbref_p <- read_html(bbref_url_p)
bbref_tables_p <- html_bbref_p |>
html_table()
bbref_p <- bbref_tables_p[[1]]
bbref_p <- bbref_p |>
slice(-31:-33)
# Clean Savant ---------------------------------------------------
stable4 <- savant_tables[[4]]
colnames(stable4) <- stable4[1,]
stable4 <- stable4 |> slice(-1)
stable4 <- stable4 |> slice(-31)
stable4
stable5 <- savant_tables[[5]]
stable5 <- stable5 |> slice(-31)
stable5
stable6 <- savant_tables[[6]]
stable6 <- stable6 |> slice(-31)
stable6
savant_table_p <- cbind(stable4, stable5, stable6)
savant_table_p <- savant_table_p[, !duplicated(names(savant_table_p))]
savant_table_p <- as.tibble(savant_table_p)
savant_table_p <- savant_table_p |> select(!27)
savant_table_p <- savant_table_p |> select(!1)
bbref_p <- bbref_p |>
mutate(across(c(`#P`:LOB), ~as.numeric(gsub(",", "", .))))
savant_table_p <- savant_table_p |>
mutate(across(c(PA:XWOBACON), ~as.numeric(gsub(",", "", .))))
# Join and Clean Up ----------------------------------------------------------------
combined_p <- savant_table_p |>
full_join(bbref_p, join_by(H, HR, BB, SO))
dataset_p <- combined_p |>
relocate(Tm) |>
select(!2)
dataset_final_p <- dataset_p |>
mutate(across(c(2:81), ~as.numeric(gsub(",", "", .))))
codes_p <- c("ATL", "CLE", "STL", "BAL", "PHI", "WSN", "SEA", "TEX", "DET", "MIN", "BOS", "TOR", "LAD", "MIL", "CIN",
"LAA", "NYM", "KCR", "CHC", "NYY", "PIT", "ARI", "MIA", "TBR", "SFG", "HOU", "COL", "OAK", "CHW", "SDP")
dataset2_p <- dataset_final_p |>
mutate(team_name = codes_p) |>
relocate(team_name)
# Tack on and Clean Fangraphs ----------------------------------------------
fg_pitching <- (fg_team_pitcher(qual = "y", startseason = 2024, endseason = 2024))
fg_dataset_p <- dataset2_p |>
full_join(fg_pitching, by = "team_name") |>
select(!ends_with(".y")) |>
rename_with(~ str_remove(., "\\.x$"), ends_with(".x")) |>
rename_with(~ paste0(., "_p")) |>
rename(
Team = Tm_p,
team_name = team_name_p
)
# JOIN HITTING AND PITCHING -----------------------------------------------
final_dset <- fg_dataset |>
full_join(fg_dataset_p, by = "team_name") |>
select(!ends_with(".y")) |>
rename_with(~ str_remove(., "\\.x$"), ends_with(".x")) |>
relocate(HR, ERA_p, FIP_p, OPS, .after = Team)
# Function ----------------------------------------------------------------
rank <- function(team1, team2, ...) {
stats <- ensyms(...)
fivetwo <- final_dset |>
select(team_name, !!!stats) |>
filter(team_name == {{team1}} | team_name == {{team2}}) |>
pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |>
pivot_wider(names_from = team_name, values_from = Value) |>
mutate(across(where(is.numeric), ~ signif(., 3)))
print(fivetwo)
}
# Run This ----------------------------------------------------------------
table <- rank("DET", "NYY", AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
rank <- function(team1, team2, column_names = list(), ...) {
stats <- ensyms(...)
formatted_date <- format(Sys.Date(), format = "%B %d, %Y")
fivetwo <- final_dset |>
select(team_name, !!!stats) |>
filter(team_name == !!team1 | team_name == !!team2) |>
pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |>
pivot_wider(names_from = team_name, values_from = Value) |>
mutate(across(where(is.numeric), ~ signif(., 3))) |>
mutate(
Stat = case_when(
Stat == "ERA_p" ~ "ERA",
Stat == "WHIP_p" ~ "WHIP",
Stat == "BB9_p" ~ "BB/9",
Stat == "K_9_p" ~ "K/9",
Stat == "H_9_p" ~ "H/9",
Stat == "HR_9_p" ~ "HR/9",
TRUE ~ Stat
)
)
# Create a gt table
table <- fivetwo |>
gt() %>%
gt_theme_538() %>%
tab_header(
title = md(paste("Game Preview:", team1, "vs", team2)),
subtitle = md(paste("**Key Statistics for the Season -", formatted_date, "**"))
) %>%
gt::tab_source_note(source_note = paste(format(Sys.Date(), format="%B %d, %Y"), "| data: baseballr")) %>%
cols_label(
Stat = "Statistic",
!!sym(team1) := column_names[[1]] %||% team1,
!!sym(team2) := column_names[[2]] %||% team2
)
# Conditional formatting for columns that exist
if ("AVG" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "AVG",
decimals = 3,
drop_trailing_zeros = TRUE
)
}
if ("OPS" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "OPS",
decimals = 3,
drop_trailing_zeros = TRUE
)
}
if ("R" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "R",
decimals = 0
)
}
if ("HR" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "HR",
decimals = 0
)
}
if ("SB" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "SB",
decimals = 0
)
}
if ("ERA" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "ERA",
decimals = 2
)
}
if ("WHIP" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "WHIP",
decimals = 2
)
}
if ("BB/9" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "BB/9",
decimals = 2
)
}
if ("K/9" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "K/9",
decimals = 2
)
}
if ("H/9" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "H/9",
decimals = 2
)
}
if ("HR/9" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "HR/9",
decimals = 2
)
}
table <- table |>
tab_spanner(
label = "Teams",
columns = c(team1, team2)
) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels(everything())
) |>
tab_style(
style = cell_borders(
sides = "all",
color = "grey",
weight = px(1)
),
locations = cells_body()
) |>
tab_options(
table.font.size = px(12),
table.width = pct(80),
column_labels.font.weight = "bold",
data_row.padding = px(5)
)
return(table)
}
# Custom column names
column_names <- list("New York Yankees", "Kansas City Royals")
# Call the function
table <- rank("NYY", "KCR", column_names, AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
# Print the table
table
gtsave(
"mlb_game_preview.png",
table,
device = "png",
width = 1600, # Adjust width as needed in pixels
height = 1200 # Adjust height as needed in pixels
)
library(dplyr)
library(tidyr)
library(gt)
library(rlang)
rank <- function(team1, team2, column_names = list(), ...) {
stats <- ensyms(...)
formatted_date <- format(Sys.Date(), format = "%B %d, %Y")
fivetwo <- final_dset |>
select(team_name, !!!stats) |>
filter(team_name == !!team1 | team_name == !!team2) |>
pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |>
pivot_wider(names_from = team_name, values_from = Value) |>
mutate(across(where(is.numeric), ~ signif(., 3))) |>
mutate(
Stat = case_when(
Stat == "ERA_p" ~ "ERA",
Stat == "WHIP_p" ~ "WHIP",
Stat == "BB9_p" ~ "BB/9",
Stat == "K_9_p" ~ "K/9",
Stat == "H_9_p" ~ "H/9",
Stat == "HR_9_p" ~ "HR/9",
TRUE ~ Stat
)
)
# Create a gt table
table <- fivetwo |>
gt() %>%
gt_theme_538() %>%
tab_header(
title = md(paste("Game Preview:", team1, "vs", team2)),
subtitle = md(paste("**Key Statistics for the Season -", formatted_date, "**"))
) %>%
gt::tab_source_note(source_note = paste(format(Sys.Date(), format="%B %d, %Y"), "| data: baseballr")) %>%
cols_label(
Stat = "Statistic",
!!sym(team1) := column_names[[1]] %||% team1,
!!sym(team2) := column_names[[2]] %||% team2
) %>%
# Conditional formatting to highlight cells with greater values
mutate(across(all_of(c(team1, team2)), as.numeric)) %>%
fmt_number(
columns = all_of(c(team1, team2)),
rows = Stat != "Statistic",
decimals = 3
) %>%
tab_spanner(
label = "Teams",
columns = c(team1, team2)
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels(everything())
) %>%
tab_style(
style = cell_borders(
sides = "all",
color = "grey",
weight = px(1)
),
locations = cells_body()
) %>%
tab_options(
table.font.size = px(12),
table.width = pct(80),
column_labels.font.weight = "bold",
data_row.padding = px(5)
) %>%
# Apply conditional formatting using style() based on comparison
style(
columns = all_of(c(team1, team2)),
rows = Stat != "Statistic",
value = if_else(.x > .y, "background-color: #2ecc71; color: black;", "")
)
return(table)
}
# Custom column names
column_names <- list("New York Mets", "New York Yankees")
# Call the function
table <- rank("NYM", "NYY", column_names, AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
gtsave(table, filename = "mlb_game_preview.png")
Ok. Got a lot of work dumped on me this week so I will try to work on it. What do you want to use these tables for? I think the one you made is good for like a paper media guide but for a TV broadcast, maybe we can incorporate color.
On Tue, Jun 11, 2024 at 3:32 PM jacobakaye @.***> wrote:
Yes. I just would like it more vertically if that makes sense. It looks very stretched out. Feel free to make whatever changes you want.
Load Packages -----------------------------------------------------------
library(tidyverse) library(rvest) library(baseballr) library(dplyr) library(tidyr) library(gt) library(rlang) library(gtExtras) library(cbbplotR)
HITTING -----------------------------------------------------------------------------------------------
Read in Baseball Savant -------------------------------------------------
savant_url <- "https://baseballsavant.mlb.com/league"html_savant <- read_html(savant_url)savant_tables <- html_savant |> html_table()
Read in and Clean bbref -------------------------------------------------
bbref_url <- "https://www.baseball-reference.com/leagues/majors/2024-standard-batting.shtml"html_bbref <- read_html(bbref_url)bbref_tables <- html_bbref |> html_table() bbref <- bbref_tables[[1]] bbref <- bbref |> slice(-31:-33)
Clean Savant ---------------------------------------------------
stable1 <- savant_tables[[1]] colnames(stable1) <- stable1[1,]stable1 <- stable1 |> slice(-1)stable1 <- stable1 |> slice(-31)stable1 stable2 <- savant_tables[[2]]stable2 <- stable2 |> slice(-31)stable2 stable3 <- savant_tables[[3]]stable3 <- stable3 |> slice(-31)stable3 savant_table <- cbind(stable1, stable2, stable3) savant_table <- savant_table[, !duplicated(names(savant_table))]savant_table <- as.tibble(savant_table) savant_table <- savant_table |> select(!27)savant_table <- savant_table |> select(!1) bbref <- bbref %>% mutate(across(c(PA, AB, H, HR, BB, SO,
2B
,3B
), ~as.numeric(gsub(",", "", .)))) savant_table <- savant_table %>% mutate(across(c(PA, AB, H, HR, BB, SO,2B
,3B
), ~as.numeric(gsub(",", "", .))))Join and Clean Up ----------------------------------------------------------------
combined <- savant_table |> full_join(bbref, join_by(AB, PA, H, HR, BB, SO,
2B
,3B
)) dataset <- combined |> relocate(Tm) |> select(-c(Season, BA.y, OBP.y, SLG.y)) |> rename( BA = BA.x, SLG = SLG.x, OBP = OBP.x, Team = Tm ) dataset_final <- dataset |> mutate(across(c(2:67), ~as.numeric(gsub(",", "", .)))) codes <- c("ATL", "CHW", "MIA", "WSN", "CLE", "LAA", "STL", "TOR", "MIN", "DET", "CIN", "BAL", "TBR", "COL", "SEA", "TEX", "KCR", "NYM", "OAK", "SFG", "CHC", "PHI", "PIT", "MIL", "HOU", "ARI", "BOS", "NYY", "SDP", "LAD") dataset2 <- dataset_final |> mutate(team_name = codes) |> relocate(team_name)Tack on and Clean Fangraphs ----------------------------------------------
fg_hitting <- (fg_team_batter(qual = "y", startseason = 2024, endseason = 2024)) fg_hitting <- fg_hitting |> mutate(across(c(PA, AB, H, HR, BB, SO,
2B
,3B
), ~as.numeric(gsub(",", "", .)))) fg_dataset <- dataset2 |> full_join(fg_hitting, by = c("PA", "AB", "H", "HR", "BB", "SO")) |> relocate(team_name.y) |> rename(team_name = team_name.y) |> rename(team_name.y = team_name.x) |> select(-team_name.y) hitter_final <- fg_dataset |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x"))PITCHING -------------------------------------------------------------------------------------------
Read in and Clean bbref -------------------------------------------------
bbref_url_p <- "https://www.baseball-reference.com/leagues/majors/2024-standard-pitching.shtml"html_bbref_p <- read_html(bbref_url_p)bbref_tables_p <- html_bbref_p |> html_table() bbref_p <- bbref_tables_p[[1]] bbref_p <- bbref_p |> slice(-31:-33)
Clean Savant ---------------------------------------------------
stable4 <- savant_tables[[4]] colnames(stable4) <- stable4[1,]stable4 <- stable4 |> slice(-1)stable4 <- stable4 |> slice(-31)stable4 stable5 <- savant_tables[[5]]stable5 <- stable5 |> slice(-31)stable5 stable6 <- savant_tables[[6]]stable6 <- stable6 |> slice(-31)stable6 savant_table_p <- cbind(stable4, stable5, stable6) savant_table_p <- savant_table_p[, !duplicated(names(savant_table_p))]savant_table_p <- as.tibble(savant_table_p) savant_table_p <- savant_table_p |> select(!27)savant_table_p <- savant_table_p |> select(!1) bbref_p <- bbref_p |> mutate(across(c(
#P
:LOB), ~as.numeric(gsub(",", "", .)))) savant_table_p <- savant_table_p |> mutate(across(c(PA:XWOBACON), ~as.numeric(gsub(",", "", .))))Join and Clean Up ----------------------------------------------------------------
combined_p <- savant_table_p |> full_join(bbref_p, join_by(H, HR, BB, SO)) dataset_p <- combined_p |> relocate(Tm) |> select(!2) dataset_final_p <- dataset_p |> mutate(across(c(2:81), ~as.numeric(gsub(",", "", .)))) codes_p <- c("ATL", "CLE", "STL", "BAL", "PHI", "WSN", "SEA", "TEX", "DET", "MIN", "BOS", "TOR", "LAD", "MIL", "CIN", "LAA", "NYM", "KCR", "CHC", "NYY", "PIT", "ARI", "MIA", "TBR", "SFG", "HOU", "COL", "OAK", "CHW", "SDP") dataset2_p <- dataset_final_p |> mutate(team_name = codes_p) |> relocate(team_name)
Tack on and Clean Fangraphs ----------------------------------------------
fg_pitching <- (fg_team_pitcher(qual = "y", startseason = 2024, endseason = 2024)) fg_dataset_p <- dataset2_p |> full_join(fg_pitching, by = "team_name") |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x")) |> rename_with(~ paste0(., "_p")) |> rename( Team = Tm_p, team_name = team_name_p )
JOIN HITTING AND PITCHING -----------------------------------------------
final_dset <- fg_dataset |> full_join(fg_dataset_p, by = "team_name") |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x")) |> relocate(HR, ERA_p, FIP_p, OPS, .after = Team)
Function ----------------------------------------------------------------
rank <- function(team1, team2, ...) { stats <- ensyms(...)
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == {{team1}} | team_name == {{team2}}) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3)))
print(fivetwo)
}
Run This ----------------------------------------------------------------
table <- rank("DET", "NYY", AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
rank <- function(team1, team2, column_names = list(), ...) { stats <- ensyms(...) formatted_date <- format(Sys.Date(), format = "%B %d, %Y")
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == !!team1 | team_name == !!team2) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3))) |> mutate( Stat = case_when( Stat == "ERA_p" ~ "ERA", Stat == "WHIP_p" ~ "WHIP", Stat == "BB9_p" ~ "BB/9", Stat == "K_9_p" ~ "K/9", Stat == "H_9_p" ~ "H/9", Stat == "HR_9_p" ~ "HR/9", TRUE ~ Stat ) )
Create a gt table
table <- fivetwo |> gt() %>% gt_theme_538() %>% tab_header( title = md(paste("Game Preview:", team1, "vs", team2)), subtitle = md(paste("Key Statistics for the Season -", formatted_date, "")) ) %>% gt::tab_source_note(source_note = paste(format(Sys.Date(), format="%B %d, %Y"), "| data: baseballr")) %>% cols_label( Stat = "Statistic", !!sym(team1) := column_names[[1]] %||% team1, !!sym(team2) := column_names[[2]] %||% team2 )
Conditional formatting for columns that exist
if ("AVG" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "AVG", decimals = 3, drop_trailing_zeros = TRUE ) } if ("OPS" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "OPS", decimals = 3, drop_trailing_zeros = TRUE ) } if ("R" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "R", decimals = 0 ) } if ("HR" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "HR", decimals = 0 ) } if ("SB" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "SB", decimals = 0 ) } if ("ERA" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "ERA", decimals = 2 ) } if ("WHIP" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "WHIP", decimals = 2 ) } if ("BB/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "BB/9", decimals = 2 ) } if ("K/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "K/9", decimals = 2 ) } if ("H/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "H/9", decimals = 2 ) } if ("HR/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "HR/9", decimals = 2 ) }
table <- table |> tab_spanner( label = "Teams", columns = c(team1, team2) ) |> tab_style( style = cell_text(weight = "bold"), locations = cells_column_labels(everything()) ) |> tab_style( style = cell_borders( sides = "all", color = "grey", weight = px(1) ), locations = cells_body() ) |> tab_options( table.font.size = px(12), table.width = pct(80), column_labels.font.weight = "bold", data_row.padding = px(5) )
return(table) }
Custom column namescolumn_names <- list("New York Yankees", "Kansas City Royals")
Call the functiontable <- rank("NYY", "KCR", column_names, AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
Print the tabletable
gtsave( "mlb_game_preview.png", table, device = "png", width = 1600, # Adjust width as needed in pixels height = 1200 # Adjust height as needed in pixels )
library(dplyr) library(tidyr) library(gt) library(rlang) rank <- function(team1, team2, column_names = list(), ...) { stats <- ensyms(...) formatted_date <- format(Sys.Date(), format = "%B %d, %Y")
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == !!team1 | team_name == !!team2) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3))) |> mutate( Stat = case_when( Stat == "ERA_p" ~ "ERA", Stat == "WHIP_p" ~ "WHIP", Stat == "BB9_p" ~ "BB/9", Stat == "K_9_p" ~ "K/9", Stat == "H_9_p" ~ "H/9", Stat == "HR_9_p" ~ "HR/9", TRUE ~ Stat ) )
Create a gt table
table <- fivetwo |> gt() %>% gt_theme_538() %>% tab_header( title = md(paste("Game Preview:", team1, "vs", team2)), subtitle = md(paste("Key Statistics for the Season -", formatted_date, "")) ) %>% gt::tab_source_note(source_note = paste(format(Sys.Date(), format="%B %d, %Y"), "| data: baseballr")) %>% cols_label( Stat = "Statistic", !!sym(team1) := column_names[[1]] %||% team1, !!sym(team2) := column_names[[2]] %||% team2 ) %>%
Conditional formatting to highlight cells with greater values
mutate(across(all_of(c(team1, team2)), as.numeric)) %>% fmt_number( columns = all_of(c(team1, team2)), rows = Stat != "Statistic", decimals = 3 ) %>% tab_spanner( label = "Teams", columns = c(team1, team2) ) %>% tab_style( style = cell_text(weight = "bold"), locations = cells_column_labels(everything()) ) %>% tab_style( style = cell_borders( sides = "all", color = "grey", weight = px(1) ), locations = cells_body() ) %>% tab_options( table.font.size = px(12), table.width = pct(80), column_labels.font.weight = "bold", data_row.padding = px(5) ) %>% # Apply conditional formatting using style() based on comparison style( columns = all_of(c(team1, team2)), rows = Stat != "Statistic", value = if_else(.x > .y, "background-color: #2ecc71; color: black;", "") )
return(table) }
Custom column namescolumn_names <- list("New York Mets", "New York Yankees")
Call the functiontable <- rank("NYM", "NYY", column_names, AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
gtsave(table, filename = "mlb_game_preview.png")
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2161632489, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBN2TJSCM73RJOW6GBDZG5UGXAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGYZTENBYHE . You are receiving this because you commented.Message ID: @.***>
Yeah. Maybe Twitter too? I like the color idea
From: kndunlap @.> Sent: Tuesday, June 11, 2024 5:34:56 PM To: kndunlap/baseball-web-scrape @.> Cc: Jacob Aaron Kaye @.>; Author @.> Subject: Re: [kndunlap/baseball-web-scrape] question (Issue #1)
Ok. Got a lot of work dumped on me this week so I will try to work on it. What do you want to use these tables for? I think the one you made is good for like a paper media guide but for a TV broadcast, maybe we can incorporate color.
On Tue, Jun 11, 2024 at 3:32 PM jacobakaye @.***> wrote:
Yes. I just would like it more vertically if that makes sense. It looks very stretched out. Feel free to make whatever changes you want.
Load Packages -----------------------------------------------------------
library(tidyverse) library(rvest) library(baseballr) library(dplyr) library(tidyr) library(gt) library(rlang) library(gtExtras) library(cbbplotR)
HITTING -----------------------------------------------------------------------------------------------
Read in Baseball Savant -------------------------------------------------
savant_url <- "https://baseballsavant.mlb.com/league"html_savant <- read_html(savant_url)savant_tables <- html_savant |> html_table()
Read in and Clean bbref -------------------------------------------------
bbref_url <- "https://www.baseball-reference.com/leagues/majors/2024-standard-batting.shtml"html_bbref <- read_html(bbref_url)bbref_tables <- html_bbref |> html_table() bbref <- bbref_tables[[1]] bbref <- bbref |> slice(-31:-33)
Clean Savant ---------------------------------------------------
stable1 <- savant_tables[[1]] colnames(stable1) <- stable1[1,]stable1 <- stable1 |> slice(-1)stable1 <- stable1 |> slice(-31)stable1 stable2 <- savant_tables[[2]]stable2 <- stable2 |> slice(-31)stable2 stable3 <- savant_tables[[3]]stable3 <- stable3 |> slice(-31)stable3 savant_table <- cbind(stable1, stable2, stable3) savant_table <- savant_table[, !duplicated(names(savant_table))]savant_table <- as.tibble(savant_table) savant_table <- savant_table |> select(!27)savant_table <- savant_table |> select(!1) bbref <- bbref %>% mutate(across(c(PA, AB, H, HR, BB, SO,
2B
,3B
), ~as.numeric(gsub(",", "", .)))) savant_table <- savant_table %>% mutate(across(c(PA, AB, H, HR, BB, SO,2B
,3B
), ~as.numeric(gsub(",", "", .))))Join and Clean Up ----------------------------------------------------------------
combined <- savant_table |> full_join(bbref, join_by(AB, PA, H, HR, BB, SO,
2B
,3B
)) dataset <- combined |> relocate(Tm) |> select(-c(Season, BA.y, OBP.y, SLG.y)) |> rename( BA = BA.x, SLG = SLG.x, OBP = OBP.x, Team = Tm ) dataset_final <- dataset |> mutate(across(c(2:67), ~as.numeric(gsub(",", "", .)))) codes <- c("ATL", "CHW", "MIA", "WSN", "CLE", "LAA", "STL", "TOR", "MIN", "DET", "CIN", "BAL", "TBR", "COL", "SEA", "TEX", "KCR", "NYM", "OAK", "SFG", "CHC", "PHI", "PIT", "MIL", "HOU", "ARI", "BOS", "NYY", "SDP", "LAD") dataset2 <- dataset_final |> mutate(team_name = codes) |> relocate(team_name)Tack on and Clean Fangraphs ----------------------------------------------
fg_hitting <- (fg_team_batter(qual = "y", startseason = 2024, endseason = 2024)) fg_hitting <- fg_hitting |> mutate(across(c(PA, AB, H, HR, BB, SO,
2B
,3B
), ~as.numeric(gsub(",", "", .)))) fg_dataset <- dataset2 |> full_join(fg_hitting, by = c("PA", "AB", "H", "HR", "BB", "SO")) |> relocate(team_name.y) |> rename(team_name = team_name.y) |> rename(team_name.y = team_name.x) |> select(-team_name.y) hitter_final <- fg_dataset |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x"))PITCHING -------------------------------------------------------------------------------------------
Read in and Clean bbref -------------------------------------------------
bbref_url_p <- "https://www.baseball-reference.com/leagues/majors/2024-standard-pitching.shtml"html_bbref_p <- read_html(bbref_url_p)bbref_tables_p <- html_bbref_p |> html_table() bbref_p <- bbref_tables_p[[1]] bbref_p <- bbref_p |> slice(-31:-33)
Clean Savant ---------------------------------------------------
stable4 <- savant_tables[[4]] colnames(stable4) <- stable4[1,]stable4 <- stable4 |> slice(-1)stable4 <- stable4 |> slice(-31)stable4 stable5 <- savant_tables[[5]]stable5 <- stable5 |> slice(-31)stable5 stable6 <- savant_tables[[6]]stable6 <- stable6 |> slice(-31)stable6 savant_table_p <- cbind(stable4, stable5, stable6) savant_table_p <- savant_table_p[, !duplicated(names(savant_table_p))]savant_table_p <- as.tibble(savant_table_p) savant_table_p <- savant_table_p |> select(!27)savant_table_p <- savant_table_p |> select(!1) bbref_p <- bbref_p |> mutate(across(c(
#P
:LOB), ~as.numeric(gsub(",", "", .)))) savant_table_p <- savant_table_p |> mutate(across(c(PA:XWOBACON), ~as.numeric(gsub(",", "", .))))Join and Clean Up ----------------------------------------------------------------
combined_p <- savant_table_p |> full_join(bbref_p, join_by(H, HR, BB, SO)) dataset_p <- combined_p |> relocate(Tm) |> select(!2) dataset_final_p <- dataset_p |> mutate(across(c(2:81), ~as.numeric(gsub(",", "", .)))) codes_p <- c("ATL", "CLE", "STL", "BAL", "PHI", "WSN", "SEA", "TEX", "DET", "MIN", "BOS", "TOR", "LAD", "MIL", "CIN", "LAA", "NYM", "KCR", "CHC", "NYY", "PIT", "ARI", "MIA", "TBR", "SFG", "HOU", "COL", "OAK", "CHW", "SDP") dataset2_p <- dataset_final_p |> mutate(team_name = codes_p) |> relocate(team_name)
Tack on and Clean Fangraphs ----------------------------------------------
fg_pitching <- (fg_team_pitcher(qual = "y", startseason = 2024, endseason = 2024)) fg_dataset_p <- dataset2_p |> full_join(fg_pitching, by = "team_name") |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x")) |> rename_with(~ paste0(., "_p")) |> rename( Team = Tm_p, team_name = team_name_p )
JOIN HITTING AND PITCHING -----------------------------------------------
final_dset <- fg_dataset |> full_join(fg_dataset_p, by = "team_name") |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x")) |> relocate(HR, ERA_p, FIP_p, OPS, .after = Team)
Function ----------------------------------------------------------------
rank <- function(team1, team2, ...) { stats <- ensyms(...)
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == {{team1}} | team_name == {{team2}}) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3)))
print(fivetwo)
}
Run This ----------------------------------------------------------------
table <- rank("DET", "NYY", AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
rank <- function(team1, team2, column_names = list(), ...) { stats <- ensyms(...) formatted_date <- format(Sys.Date(), format = "%B %d, %Y")
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == !!team1 | team_name == !!team2) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3))) |> mutate( Stat = case_when( Stat == "ERA_p" ~ "ERA", Stat == "WHIP_p" ~ "WHIP", Stat == "BB9_p" ~ "BB/9", Stat == "K_9_p" ~ "K/9", Stat == "H_9_p" ~ "H/9", Stat == "HR_9_p" ~ "HR/9", TRUE ~ Stat ) )
Create a gt table
table <- fivetwo |> gt() %>% gt_theme_538() %>% tab_header( title = md(paste("Game Preview:", team1, "vs", team2)), subtitle = md(paste("Key Statistics for the Season -", formatted_date, "")) ) %>% gt::tab_source_note(source_note = paste(format(Sys.Date(), format="%B %d, %Y"), "| data: baseballr")) %>% cols_label( Stat = "Statistic", !!sym(team1) := column_names[[1]] %||% team1, !!sym(team2) := column_names[[2]] %||% team2 )
Conditional formatting for columns that exist
if ("AVG" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "AVG", decimals = 3, drop_trailing_zeros = TRUE ) } if ("OPS" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "OPS", decimals = 3, drop_trailing_zeros = TRUE ) } if ("R" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "R", decimals = 0 ) } if ("HR" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "HR", decimals = 0 ) } if ("SB" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "SB", decimals = 0 ) } if ("ERA" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "ERA", decimals = 2 ) } if ("WHIP" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "WHIP", decimals = 2 ) } if ("BB/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "BB/9", decimals = 2 ) } if ("K/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "K/9", decimals = 2 ) } if ("H/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "H/9", decimals = 2 ) } if ("HR/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "HR/9", decimals = 2 ) }
table <- table |> tab_spanner( label = "Teams", columns = c(team1, team2) ) |> tab_style( style = cell_text(weight = "bold"), locations = cells_column_labels(everything()) ) |> tab_style( style = cell_borders( sides = "all", color = "grey", weight = px(1) ), locations = cells_body() ) |> tab_options( table.font.size = px(12), table.width = pct(80), column_labels.font.weight = "bold", data_row.padding = px(5) )
return(table) }
Custom column namescolumn_names <- list("New York Yankees", "Kansas City Royals")
Call the functiontable <- rank("NYY", "KCR", column_names, AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
Print the tabletable
gtsave( "mlb_game_preview.png", table, device = "png", width = 1600, # Adjust width as needed in pixels height = 1200 # Adjust height as needed in pixels )
library(dplyr) library(tidyr) library(gt) library(rlang) rank <- function(team1, team2, column_names = list(), ...) { stats <- ensyms(...) formatted_date <- format(Sys.Date(), format = "%B %d, %Y")
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == !!team1 | team_name == !!team2) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3))) |> mutate( Stat = case_when( Stat == "ERA_p" ~ "ERA", Stat == "WHIP_p" ~ "WHIP", Stat == "BB9_p" ~ "BB/9", Stat == "K_9_p" ~ "K/9", Stat == "H_9_p" ~ "H/9", Stat == "HR_9_p" ~ "HR/9", TRUE ~ Stat ) )
Create a gt table
table <- fivetwo |> gt() %>% gt_theme_538() %>% tab_header( title = md(paste("Game Preview:", team1, "vs", team2)), subtitle = md(paste("Key Statistics for the Season -", formatted_date, "")) ) %>% gt::tab_source_note(source_note = paste(format(Sys.Date(), format="%B %d, %Y"), "| data: baseballr")) %>% cols_label( Stat = "Statistic", !!sym(team1) := column_names[[1]] %||% team1, !!sym(team2) := column_names[[2]] %||% team2 ) %>%
Conditional formatting to highlight cells with greater values
mutate(across(all_of(c(team1, team2)), as.numeric)) %>% fmt_number( columns = all_of(c(team1, team2)), rows = Stat != "Statistic", decimals = 3 ) %>% tab_spanner( label = "Teams", columns = c(team1, team2) ) %>% tab_style( style = cell_text(weight = "bold"), locations = cells_column_labels(everything()) ) %>% tab_style( style = cell_borders( sides = "all", color = "grey", weight = px(1) ), locations = cells_body() ) %>% tab_options( table.font.size = px(12), table.width = pct(80), column_labels.font.weight = "bold", data_row.padding = px(5) ) %>%
Apply conditional formatting using style() based on comparison
style( columns = all_of(c(team1, team2)), rows = Stat != "Statistic", value = if_else(.x > .y, "background-color: #2ecc71; color: black;", "") )
return(table) }
Custom column namescolumn_names <- list("New York Mets", "New York Yankees")
Call the functiontable <- rank("NYM", "NYY", column_names, AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
gtsave(table, filename = "mlb_game_preview.png")
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2161632489, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBN2TJSCM73RJOW6GBDZG5UGXAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGYZTENBYHE . You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHubhttps://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2161636658, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6FHSPVN43WEESX373LGJHDZG5UQBAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGYZTMNRVHA. You are receiving this because you authored the thread.Message ID: @.***>
I'm definitely struggling to fix the large amount of space in the cells. I don't use chatgpt to make most of my code but I'm trying for this and not getting any good suggestions. Maybe when the table is made you can screenshot it from the bottom left window of RStudio. Other than that I am not sure...
On Tue, Jun 11, 2024 at 3:44 PM jacobakaye @.***> wrote:
Yeah. Maybe Twitter too? I like the color idea
From: kndunlap @.> Sent: Tuesday, June 11, 2024 5:34:56 PM To: kndunlap/baseball-web-scrape @.> Cc: Jacob Aaron Kaye @.>; Author @.> Subject: Re: [kndunlap/baseball-web-scrape] question (Issue #1)
Ok. Got a lot of work dumped on me this week so I will try to work on it. What do you want to use these tables for? I think the one you made is good for like a paper media guide but for a TV broadcast, maybe we can incorporate color.
On Tue, Jun 11, 2024 at 3:32 PM jacobakaye @.***> wrote:
Yes. I just would like it more vertically if that makes sense. It looks very stretched out. Feel free to make whatever changes you want.
Load Packages
library(tidyverse) library(rvest) library(baseballr) library(dplyr) library(tidyr) library(gt) library(rlang) library(gtExtras) library(cbbplotR)
HITTING
Read in Baseball Savant
savant_url <- "https://baseballsavant.mlb.com/league"html_savant <- read_html(savant_url)savant_tables <- html_savant |> html_table()
Read in and Clean bbref
bbref_url <- " https://www.baseball-reference.com/leagues/majors/2024-standard-batting.shtml"html_bbref <- read_html(bbref_url)bbref_tables <- html_bbref |> html_table() bbref <- bbref_tables[[1]] bbref <- bbref |> slice(-31:-33)
Clean Savant ---------------------------------------------------
stable1 <- savant_tables[[1]] colnames(stable1) <- stable1[1,]stable1 <- stable1 |> slice(-1)stable1 <- stable1 |> slice(-31)stable1 stable2 <- savant_tables[[2]]stable2 <- stable2 |> slice(-31)stable2 stable3 <- savant_tables[[3]]stable3 <- stable3 |> slice(-31)stable3 savant_table <- cbind(stable1, stable2, stable3) savant_table <- savant_table[, !duplicated(names(savant_table))]savant_table <- as.tibble(savant_table) savant_table <- savant_table |> select(!27)savant_table <- savant_table |> select(!1) bbref <- bbref %>% mutate(across(c(PA, AB, H, HR, BB, SO,
2B
,3B
), ~as.numeric(gsub(",", "", .)))) savant_table <- savant_table %>% mutate(across(c(PA, AB, H, HR, BB, SO,2B
,3B
), ~as.numeric(gsub(",", "", .))))Join and Clean Up
combined <- savant_table |> full_join(bbref, join_by(AB, PA, H, HR, BB, SO,
2B
,3B
)) dataset <- combined |> relocate(Tm) |> select(-c(Season, BA.y, OBP.y, SLG.y)) |> rename( BA = BA.x, SLG = SLG.x, OBP = OBP.x, Team = Tm ) dataset_final <- dataset |> mutate(across(c(2:67), ~as.numeric(gsub(",", "", .)))) codes <- c("ATL", "CHW", "MIA", "WSN", "CLE", "LAA", "STL", "TOR", "MIN", "DET", "CIN", "BAL", "TBR", "COL", "SEA", "TEX", "KCR", "NYM", "OAK", "SFG", "CHC", "PHI", "PIT", "MIL", "HOU", "ARI", "BOS", "NYY", "SDP", "LAD") dataset2 <- dataset_final |> mutate(team_name = codes) |> relocate(team_name)Tack on and Clean Fangraphs
fg_hitting <- (fg_team_batter(qual = "y", startseason = 2024, endseason = 2024)) fg_hitting <- fg_hitting |> mutate(across(c(PA, AB, H, HR, BB, SO,
2B
,3B
), ~as.numeric(gsub(",", "", .)))) fg_dataset <- dataset2 |> full_join(fg_hitting, by = c("PA", "AB", "H", "HR", "BB", "SO")) |> relocate(team_name.y) |> rename(team_name = team_name.y) |> rename(team_name.y = team_name.x) |> select(-team_name.y) hitter_final <- fg_dataset |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x"))PITCHING
Read in and Clean bbref
bbref_url_p <- " https://www.baseball-reference.com/leagues/majors/2024-standard-pitching.shtml"html_bbref_p <- read_html(bbref_url_p)bbref_tables_p <- html_bbref_p |> html_table() bbref_p <- bbref_tables_p[[1]] bbref_p <- bbref_p |> slice(-31:-33)
Clean Savant ---------------------------------------------------
stable4 <- savant_tables[[4]] colnames(stable4) <- stable4[1,]stable4 <- stable4 |> slice(-1)stable4 <- stable4 |> slice(-31)stable4 stable5 <- savant_tables[[5]]stable5 <- stable5 |> slice(-31)stable5 stable6 <- savant_tables[[6]]stable6 <- stable6 |> slice(-31)stable6 savant_table_p <- cbind(stable4, stable5, stable6) savant_table_p <- savant_table_p[, !duplicated(names(savant_table_p))]savant_table_p <- as.tibble(savant_table_p) savant_table_p <- savant_table_p |> select(!27)savant_table_p <- savant_table_p |> select(!1) bbref_p <- bbref_p |> mutate(across(c(
#P
:LOB), ~as.numeric(gsub(",", "", .)))) savant_table_p <- savant_table_p |> mutate(across(c(PA:XWOBACON), ~as.numeric(gsub(",", "", .))))Join and Clean Up
combined_p <- savant_table_p |> full_join(bbref_p, join_by(H, HR, BB, SO)) dataset_p <- combined_p |> relocate(Tm) |> select(!2) dataset_final_p <- dataset_p |> mutate(across(c(2:81), ~as.numeric(gsub(",", "", .)))) codes_p <- c("ATL", "CLE", "STL", "BAL", "PHI", "WSN", "SEA", "TEX", "DET", "MIN", "BOS", "TOR", "LAD", "MIL", "CIN", "LAA", "NYM", "KCR", "CHC", "NYY", "PIT", "ARI", "MIA", "TBR", "SFG", "HOU", "COL", "OAK", "CHW", "SDP") dataset2_p <- dataset_final_p |> mutate(team_name = codes_p) |> relocate(team_name)
Tack on and Clean Fangraphs
fg_pitching <- (fg_team_pitcher(qual = "y", startseason = 2024, endseason = 2024)) fg_dataset_p <- dataset2_p |> full_join(fg_pitching, by = "team_name") |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x")) |> rename_with(~ paste0(., "_p")) |> rename( Team = Tm_p, team_name = team_name_p )
JOIN HITTING AND PITCHING
final_dset <- fg_dataset |> full_join(fg_dataset_p, by = "team_name") |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x")) |> relocate(HR, ERA_p, FIP_p, OPS, .after = Team)
Function
rank <- function(team1, team2, ...) { stats <- ensyms(...)
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == {{team1}} | team_name == {{team2}}) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3)))
print(fivetwo)
}
Run This
table <- rank("DET", "NYY", AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
rank <- function(team1, team2, column_names = list(), ...) { stats <- ensyms(...) formatted_date <- format(Sys.Date(), format = "%B %d, %Y")
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == !!team1 | team_name == !!team2) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3))) |> mutate( Stat = case_when( Stat == "ERA_p" ~ "ERA", Stat == "WHIP_p" ~ "WHIP", Stat == "BB9_p" ~ "BB/9", Stat == "K_9_p" ~ "K/9", Stat == "H_9_p" ~ "H/9", Stat == "HR_9_p" ~ "HR/9", TRUE ~ Stat ) )
Create a gt table
table <- fivetwo |> gt() %>% gt_theme_538() %>% tab_header( title = md(paste("Game Preview:", team1, "vs", team2)), subtitle = md(paste("Key Statistics for the Season -", formatted_date, "")) ) %>% gt::tab_source_note(source_note = paste(format(Sys.Date(), format="%B %d, %Y"), "| data: baseballr")) %>% cols_label( Stat = "Statistic", !!sym(team1) := column_names[[1]] %||% team1, !!sym(team2) := column_names[[2]] %||% team2 )
Conditional formatting for columns that exist
if ("AVG" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "AVG", decimals = 3, drop_trailing_zeros = TRUE ) } if ("OPS" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "OPS", decimals = 3, drop_trailing_zeros = TRUE ) } if ("R" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "R", decimals = 0 ) } if ("HR" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "HR", decimals = 0 ) } if ("SB" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "SB", decimals = 0 ) } if ("ERA" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "ERA", decimals = 2 ) } if ("WHIP" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "WHIP", decimals = 2 ) } if ("BB/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "BB/9", decimals = 2 ) } if ("K/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "K/9", decimals = 2 ) } if ("H/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "H/9", decimals = 2 ) } if ("HR/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "HR/9", decimals = 2 ) }
table <- table |> tab_spanner( label = "Teams", columns = c(team1, team2) ) |> tab_style( style = cell_text(weight = "bold"), locations = cells_column_labels(everything()) ) |> tab_style( style = cell_borders( sides = "all", color = "grey", weight = px(1) ), locations = cells_body() ) |> tab_options( table.font.size = px(12), table.width = pct(80), column_labels.font.weight = "bold", data_row.padding = px(5) )
return(table) }
Custom column namescolumn_names <- list("New York Yankees", "Kansas
City Royals")
Call the functiontable <- rank("NYY", "KCR", column_names, AVG, OPS,
R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
Print the tabletable
gtsave( "mlb_game_preview.png", table, device = "png", width = 1600, # Adjust width as needed in pixels height = 1200 # Adjust height as needed in pixels )
library(dplyr) library(tidyr) library(gt) library(rlang) rank <- function(team1, team2, column_names = list(), ...) { stats <- ensyms(...) formatted_date <- format(Sys.Date(), format = "%B %d, %Y")
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == !!team1 | team_name == !!team2) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3))) |> mutate( Stat = case_when( Stat == "ERA_p" ~ "ERA", Stat == "WHIP_p" ~ "WHIP", Stat == "BB9_p" ~ "BB/9", Stat == "K_9_p" ~ "K/9", Stat == "H_9_p" ~ "H/9", Stat == "HR_9_p" ~ "HR/9", TRUE ~ Stat ) )
Create a gt table
table <- fivetwo |> gt() %>% gt_theme_538() %>% tab_header( title = md(paste("Game Preview:", team1, "vs", team2)), subtitle = md(paste("Key Statistics for the Season -", formatted_date, "")) ) %>% gt::tab_source_note(source_note = paste(format(Sys.Date(), format="%B %d, %Y"), "| data: baseballr")) %>% cols_label( Stat = "Statistic", !!sym(team1) := column_names[[1]] %||% team1, !!sym(team2) := column_names[[2]] %||% team2 ) %>%
Conditional formatting to highlight cells with greater values
mutate(across(all_of(c(team1, team2)), as.numeric)) %>% fmt_number( columns = all_of(c(team1, team2)), rows = Stat != "Statistic", decimals = 3 ) %>% tab_spanner( label = "Teams", columns = c(team1, team2) ) %>% tab_style( style = cell_text(weight = "bold"), locations = cells_column_labels(everything()) ) %>% tab_style( style = cell_borders( sides = "all", color = "grey", weight = px(1) ), locations = cells_body() ) %>% tab_options( table.font.size = px(12), table.width = pct(80), column_labels.font.weight = "bold", data_row.padding = px(5) ) %>%
Apply conditional formatting using style() based on comparison
style( columns = all_of(c(team1, team2)), rows = Stat != "Statistic", value = if_else(.x > .y, "background-color: #2ecc71; color: black;", "") )
return(table) }
Custom column namescolumn_names <- list("New York Mets", "New York
Yankees")
Call the functiontable <- rank("NYM", "NYY", column_names, AVG, OPS,
R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
gtsave(table, filename = "mlb_game_preview.png")
— Reply to this email directly, view it on GitHub < https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2161632489>,
or unsubscribe < https://github.com/notifications/unsubscribe-auth/AORVLBN2TJSCM73RJOW6GBDZG5UGXAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGYZTENBYHE>
. You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub< https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2161636658>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/A6FHSPVN43WEESX373LGJHDZG5UQBAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGYZTMNRVHA>.
You are receiving this because you authored the thread.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2161649666, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBPXWRGIEHB3OJHKT5TZG5VTPAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGY2DSNRWGY . You are receiving this because you commented.Message ID: @.***>
Hi sorry for the delay. I'll try to work on it. I just think it looks better more compact. What do you think about coloring?
More compact is good. Color is interesting, you could do where the team with the higher value is colored in that column. Or do individual team colors but that might be trickier.
On Sat, Jun 15, 2024 at 10:35 AM jacobakaye @.***> wrote:
Hi sorry for the delay. I'll try to work on it. I just think it looks better more compact. What do you think about coloring?
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2170241847, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBIVEKNP6PFSENTCZEDZHRUNFAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZQGI2DCOBUG4 . You are receiving this because you commented.Message ID: @.***>
Maybe bold the better stat instead of color?
I think that's a good idea. The color could be on the outsides just to give it some pop.
On Mon, Jun 17, 2024 at 12:00 PM jacobakaye @.***> wrote:
Maybe bold the better stat instead of color?
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2174006488, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBNSIKPLGOPC665ONM3ZH4P4HAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZUGAYDMNBYHA . You are receiving this because you commented.Message ID: @.***>
Something's wrong with this code, the stats are not accurate. Or at least the ERA isn't. Do you want to take a look?
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(rvest)
library(baseballr)
library(dplyr)
library(tidyr)
library(gt)
library(rlang)
library(gtExtras)
library(cbbplotR)
# HITTING -----------------------------------------------------------------------------------------------
# Read in Baseball Savant -------------------------------------------------
savant_url <- "https://baseballsavant.mlb.com/league"
html_savant <- read_html(savant_url)
savant_tables <- html_savant |>
html_table()
# Read in and Clean bbref -------------------------------------------------
bbref_url <- "https://www.baseball-reference.com/leagues/majors/2024-standard-batting.shtml"
html_bbref <- read_html(bbref_url)
bbref_tables <- html_bbref |>
html_table()
bbref <- bbref_tables[[1]]
bbref <- bbref |>
slice(-31:-33)
# Clean Savant ---------------------------------------------------
stable1 <- savant_tables[[1]]
colnames(stable1) <- stable1[1,]
stable1 <- stable1 |> slice(-1)
stable1 <- stable1 |> slice(-31)
stable1
stable2 <- savant_tables[[2]]
stable2 <- stable2 |> slice(-31)
stable2
stable3 <- savant_tables[[3]]
stable3 <- stable3 |> slice(-31)
stable3
savant_table <- cbind(stable1, stable2, stable3)
savant_table <- savant_table[, !duplicated(names(savant_table))]
savant_table <- as.tibble(savant_table)
savant_table <- savant_table |> select(!27)
savant_table <- savant_table |> select(!1)
bbref <- bbref %>%
mutate(across(c(PA, AB, H, HR, BB, SO, `2B`, `3B`), ~as.numeric(gsub(",", "", .))))
savant_table <- savant_table %>%
mutate(across(c(PA, AB, H, HR, BB, SO, `2B`, `3B`), ~as.numeric(gsub(",", "", .))))
# Join and Clean Up ----------------------------------------------------------------
combined <- savant_table |>
full_join(bbref, join_by(AB, PA, H, HR, BB, SO, `2B`, `3B`))
dataset <- combined |>
relocate(Tm) |>
select(-c(Season, BA.y, OBP.y, SLG.y)) |>
rename(
BA = BA.x,
SLG = SLG.x,
OBP = OBP.x,
Team = Tm
)
dataset_final <- dataset |>
mutate(across(c(2:67), ~as.numeric(gsub(",", "", .))))
codes <- c("ATL", "CHW", "MIA", "WSN", "CLE", "LAA", "STL", "TOR", "MIN", "DET", "CIN", "BAL", "TBR", "COL", "SEA",
"TEX", "KCR", "NYM", "OAK", "SFG", "CHC", "PHI", "PIT", "MIL", "HOU", "ARI", "BOS", "NYY", "SDP", "LAD")
dataset2 <- dataset_final |>
mutate(team_name = codes) |>
relocate(team_name)
# Tack on and Clean Fangraphs ----------------------------------------------
fg_hitting <- (fg_team_batter(qual = "y", startseason = 2024, endseason = 2024))
fg_hitting <- fg_hitting |>
mutate(across(c(PA, AB, H, HR, BB, SO, `2B`, `3B`), ~as.numeric(gsub(",", "", .))))
fg_dataset <- dataset2 |>
full_join(fg_hitting, by = c("PA", "AB", "H", "HR", "BB", "SO")) |>
relocate(team_name.y) |>
rename(team_name = team_name.y) |>
rename(team_name.y = team_name.x) |>
select(-team_name.y)
hitter_final <- fg_dataset |>
select(!ends_with(".y")) |>
rename_with(~ str_remove(., "\\.x$"), ends_with(".x"))
# PITCHING -------------------------------------------------------------------------------------------
# Read in and Clean bbref -------------------------------------------------
bbref_url_p <- "https://www.baseball-reference.com/leagues/majors/2024-standard-pitching.shtml"
html_bbref_p <- read_html(bbref_url_p)
bbref_tables_p <- html_bbref_p |>
html_table()
bbref_p <- bbref_tables_p[[1]]
bbref_p <- bbref_p |>
slice(-31:-33)
# Clean Savant ---------------------------------------------------
stable4 <- savant_tables[[4]]
colnames(stable4) <- stable4[1,]
stable4 <- stable4 |> slice(-1)
stable4 <- stable4 |> slice(-31)
stable4
stable5 <- savant_tables[[5]]
stable5 <- stable5 |> slice(-31)
stable5
stable6 <- savant_tables[[6]]
stable6 <- stable6 |> slice(-31)
stable6
savant_table_p <- cbind(stable4, stable5, stable6)
savant_table_p <- savant_table_p[, !duplicated(names(savant_table_p))]
savant_table_p <- as.tibble(savant_table_p)
savant_table_p <- savant_table_p |> select(!27)
savant_table_p <- savant_table_p |> select(!1)
bbref_p <- bbref_p |>
mutate(across(c(`#P`:LOB), ~as.numeric(gsub(",", "", .))))
savant_table_p <- savant_table_p |>
mutate(across(c(PA:XWOBACON), ~as.numeric(gsub(",", "", .))))
# Join and Clean Up ----------------------------------------------------------------
combined_p <- savant_table_p |>
full_join(bbref_p, join_by(H, HR, BB, SO))
dataset_p <- combined_p |>
relocate(Tm) |>
select(!2)
dataset_final_p <- dataset_p |>
mutate(across(c(2:81), ~as.numeric(gsub(",", "", .))))
codes_p <- c("ATL", "CLE", "STL", "BAL", "PHI", "WSN", "SEA", "TEX", "DET", "MIN", "BOS", "TOR", "LAD", "MIL", "CIN",
"LAA", "NYM", "KCR", "CHC", "NYY", "PIT", "ARI", "MIA", "TBR", "SFG", "HOU", "COL", "OAK", "CHW", "SDP")
dataset2_p <- dataset_final_p |>
mutate(team_name = codes_p) |>
relocate(team_name)
# Tack on and Clean Fangraphs ----------------------------------------------
fg_pitching <- (fg_team_pitcher(qual = "y", startseason = 2024, endseason = 2024))
fg_dataset_p <- dataset2_p |>
full_join(fg_pitching, by = "team_name") |>
select(!ends_with(".y")) |>
rename_with(~ str_remove(., "\\.x$"), ends_with(".x")) |>
rename_with(~ paste0(., "_p")) |>
rename(
Team = Tm_p,
team_name = team_name_p
)
# JOIN HITTING AND PITCHING -----------------------------------------------
final_dset <- fg_dataset |>
full_join(fg_dataset_p, by = "team_name") |>
select(!ends_with(".y")) |>
rename_with(~ str_remove(., "\\.x$"), ends_with(".x")) |>
relocate(HR, ERA_p, FIP_p, OPS, .after = Team)
# Function ----------------------------------------------------------------
rank <- function(team1, team2, ...) {
stats <- ensyms(...)
fivetwo <- final_dset |>
select(team_name, !!!stats) |>
filter(team_name == {{team1}} | team_name == {{team2}}) |>
pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |>
pivot_wider(names_from = team_name, values_from = Value) |>
mutate(across(where(is.numeric), ~ signif(., 3)))
print(fivetwo)
}
# Run This ----------------------------------------------------------------
table <- rank("NYY", "BAL", AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
rank <- function(team1, team2, column_names = list(), ...) {
stats <- ensyms(...)
formatted_date <- format(Sys.Date(), format = "%B %d, %Y")
fivetwo <- final_dset |>
select(team_name, !!!stats) |>
filter(team_name == !!team1 | team_name == !!team2) |>
pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |>
pivot_wider(names_from = team_name, values_from = Value) |>
mutate(across(where(is.numeric), ~ signif(., 3))) |>
mutate(
Stat = case_when(
Stat == "ERA_p" ~ "ERA",
Stat == "WHIP_p" ~ "WHIP",
Stat == "BB9_p" ~ "BB/9",
Stat == "K_9_p" ~ "K/9",
Stat == "H_9_p" ~ "H/9",
Stat == "HR_9_p" ~ "HR/9",
TRUE ~ Stat
)
)
# Create a gt table
table <- fivetwo |>
gt() %>%
gt_theme_538() %>%
tab_header(
title = md(paste("Game Preview:", team1, "vs", team2)),
subtitle = md(paste("**Key Statistics for the Season -", formatted_date, "**"))
) %>%
gt::tab_source_note(source_note = paste(format(Sys.Date(), format="%B %d, %Y"), "| data: baseballr")) %>%
cols_label(
Stat = "Statistic",
!!sym(team1) := column_names[[1]] %||% team1,
!!sym(team2) := column_names[[2]] %||% team2
)
# Conditional formatting for columns that exist
if ("AVG" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "AVG",
decimals = 3,
drop_trailing_zeros = TRUE
)
}
if ("OPS" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "OPS",
decimals = 3,
drop_trailing_zeros = TRUE
)
}
if ("R" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "R",
decimals = 0
)
}
if ("HR" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "HR",
decimals = 0
)
}
if ("SB" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "SB",
decimals = 0
)
}
if ("ERA" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "ERA",
decimals = 2
)
}
if ("WHIP" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "WHIP",
decimals = 2
)
}
if ("BB/9" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "BB/9",
decimals = 2
)
}
if ("K/9" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "K/9",
decimals = 2
)
}
if ("H/9" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "H/9",
decimals = 2
)
}
if ("HR/9" %in% fivetwo$Stat) {
table <- table |>
fmt_number(
columns = c(team1, team2),
rows = Stat == "HR/9",
decimals = 2
)
}
table <- table |>
tab_spanner(
label = "Teams",
columns = c(team1, team2)
) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels(everything())
) |>
tab_style(
style = cell_borders(
sides = "all",
color = "grey",
weight = px(1)
),
locations = cells_body()
) |>
tab_options(
table.font.size = px(12),
table.width = pct(80),
column_labels.font.weight = "bold",
data_row.padding = px(5)
)
return(table)
}
# Custom column names
column_names <- list("New York Yankees", "Baltimore Orioles")
# Call the function
table <- rank("NYY", "BAL", column_names, AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
# Print the table
table
gtsave(table, filename = "mlb_game_preview.png")
gtsave(
"mlb_game_preview.png",
table,
device = "png",
width = 1600, # Adjust width as needed in pixels
height = 1200 # Adjust height as needed in pixels
)
![mlb_game_preview](https://github.com/kndunlap/baseball-web-scrape/assets/126515518/009b88f9-055d-4bb7-9cb8-be14fc71619a)
Like it's the ERA of another team or the ERA is slightly off?
On Tue, Jun 18, 2024 at 9:35 AM jacobakaye @.***> wrote:
Something's wrong with this code, the stats are not accurate. Or at least the ERA isn't. Do you want to take a look?
Load Packages -----------------------------------------------------------
library(tidyverse) library(rvest) library(baseballr) library(dplyr) library(tidyr) library(gt) library(rlang) library(gtExtras) library(cbbplotR)
HITTING -----------------------------------------------------------------------------------------------
Read in Baseball Savant -------------------------------------------------
savant_url <- "https://baseballsavant.mlb.com/league"html_savant <- read_html(savant_url)savant_tables <- html_savant |> html_table()
Read in and Clean bbref -------------------------------------------------
bbref_url <- "https://www.baseball-reference.com/leagues/majors/2024-standard-batting.shtml"html_bbref <- read_html(bbref_url)bbref_tables <- html_bbref |> html_table() bbref <- bbref_tables[[1]] bbref <- bbref |> slice(-31:-33)
Clean Savant ---------------------------------------------------
stable1 <- savant_tables[[1]] colnames(stable1) <- stable1[1,]stable1 <- stable1 |> slice(-1)stable1 <- stable1 |> slice(-31)stable1 stable2 <- savant_tables[[2]]stable2 <- stable2 |> slice(-31)stable2 stable3 <- savant_tables[[3]]stable3 <- stable3 |> slice(-31)stable3 savant_table <- cbind(stable1, stable2, stable3) savant_table <- savant_table[, !duplicated(names(savant_table))]savant_table <- as.tibble(savant_table) savant_table <- savant_table |> select(!27)savant_table <- savant_table |> select(!1) bbref <- bbref %>% mutate(across(c(PA, AB, H, HR, BB, SO,
2B
,3B
), ~as.numeric(gsub(",", "", .)))) savant_table <- savant_table %>% mutate(across(c(PA, AB, H, HR, BB, SO,2B
,3B
), ~as.numeric(gsub(",", "", .))))Join and Clean Up ----------------------------------------------------------------
combined <- savant_table |> full_join(bbref, join_by(AB, PA, H, HR, BB, SO,
2B
,3B
)) dataset <- combined |> relocate(Tm) |> select(-c(Season, BA.y, OBP.y, SLG.y)) |> rename( BA = BA.x, SLG = SLG.x, OBP = OBP.x, Team = Tm ) dataset_final <- dataset |> mutate(across(c(2:67), ~as.numeric(gsub(",", "", .)))) codes <- c("ATL", "CHW", "MIA", "WSN", "CLE", "LAA", "STL", "TOR", "MIN", "DET", "CIN", "BAL", "TBR", "COL", "SEA", "TEX", "KCR", "NYM", "OAK", "SFG", "CHC", "PHI", "PIT", "MIL", "HOU", "ARI", "BOS", "NYY", "SDP", "LAD") dataset2 <- dataset_final |> mutate(team_name = codes) |> relocate(team_name)Tack on and Clean Fangraphs ----------------------------------------------
fg_hitting <- (fg_team_batter(qual = "y", startseason = 2024, endseason = 2024)) fg_hitting <- fg_hitting |> mutate(across(c(PA, AB, H, HR, BB, SO,
2B
,3B
), ~as.numeric(gsub(",", "", .)))) fg_dataset <- dataset2 |> full_join(fg_hitting, by = c("PA", "AB", "H", "HR", "BB", "SO")) |> relocate(team_name.y) |> rename(team_name = team_name.y) |> rename(team_name.y = team_name.x) |> select(-team_name.y) hitter_final <- fg_dataset |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x"))PITCHING -------------------------------------------------------------------------------------------
Read in and Clean bbref -------------------------------------------------
bbref_url_p <- "https://www.baseball-reference.com/leagues/majors/2024-standard-pitching.shtml"html_bbref_p <- read_html(bbref_url_p)bbref_tables_p <- html_bbref_p |> html_table() bbref_p <- bbref_tables_p[[1]] bbref_p <- bbref_p |> slice(-31:-33)
Clean Savant ---------------------------------------------------
stable4 <- savant_tables[[4]] colnames(stable4) <- stable4[1,]stable4 <- stable4 |> slice(-1)stable4 <- stable4 |> slice(-31)stable4 stable5 <- savant_tables[[5]]stable5 <- stable5 |> slice(-31)stable5 stable6 <- savant_tables[[6]]stable6 <- stable6 |> slice(-31)stable6 savant_table_p <- cbind(stable4, stable5, stable6) savant_table_p <- savant_table_p[, !duplicated(names(savant_table_p))]savant_table_p <- as.tibble(savant_table_p) savant_table_p <- savant_table_p |> select(!27)savant_table_p <- savant_table_p |> select(!1) bbref_p <- bbref_p |> mutate(across(c(
#P
:LOB), ~as.numeric(gsub(",", "", .)))) savant_table_p <- savant_table_p |> mutate(across(c(PA:XWOBACON), ~as.numeric(gsub(",", "", .))))Join and Clean Up ----------------------------------------------------------------
combined_p <- savant_table_p |> full_join(bbref_p, join_by(H, HR, BB, SO)) dataset_p <- combined_p |> relocate(Tm) |> select(!2) dataset_final_p <- dataset_p |> mutate(across(c(2:81), ~as.numeric(gsub(",", "", .)))) codes_p <- c("ATL", "CLE", "STL", "BAL", "PHI", "WSN", "SEA", "TEX", "DET", "MIN", "BOS", "TOR", "LAD", "MIL", "CIN", "LAA", "NYM", "KCR", "CHC", "NYY", "PIT", "ARI", "MIA", "TBR", "SFG", "HOU", "COL", "OAK", "CHW", "SDP") dataset2_p <- dataset_final_p |> mutate(team_name = codes_p) |> relocate(team_name)
Tack on and Clean Fangraphs ----------------------------------------------
fg_pitching <- (fg_team_pitcher(qual = "y", startseason = 2024, endseason = 2024)) fg_dataset_p <- dataset2_p |> full_join(fg_pitching, by = "team_name") |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x")) |> rename_with(~ paste0(., "_p")) |> rename( Team = Tm_p, team_name = team_name_p )
JOIN HITTING AND PITCHING -----------------------------------------------
final_dset <- fg_dataset |> full_join(fg_dataset_p, by = "team_name") |> select(!ends_with(".y")) |> rename_with(~ str_remove(., "\.x$"), ends_with(".x")) |> relocate(HR, ERA_p, FIP_p, OPS, .after = Team)
Function ----------------------------------------------------------------
rank <- function(team1, team2, ...) { stats <- ensyms(...)
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == {{team1}} | team_name == {{team2}}) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3)))
print(fivetwo)
}
Run This ----------------------------------------------------------------
table <- rank("NYY", "BAL", AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
rank <- function(team1, team2, column_names = list(), ...) { stats <- ensyms(...) formatted_date <- format(Sys.Date(), format = "%B %d, %Y")
fivetwo <- final_dset |> select(team_name, !!!stats) |> filter(team_name == !!team1 | team_name == !!team2) |> pivot_longer(cols = -team_name, names_to = "Stat", values_to = "Value") |> pivot_wider(names_from = team_name, values_from = Value) |> mutate(across(where(is.numeric), ~ signif(., 3))) |> mutate( Stat = case_when( Stat == "ERA_p" ~ "ERA", Stat == "WHIP_p" ~ "WHIP", Stat == "BB9_p" ~ "BB/9", Stat == "K_9_p" ~ "K/9", Stat == "H_9_p" ~ "H/9", Stat == "HR_9_p" ~ "HR/9", TRUE ~ Stat ) )
Create a gt table
table <- fivetwo |> gt() %>% gt_theme_538() %>% tab_header( title = md(paste("Game Preview:", team1, "vs", team2)), subtitle = md(paste("Key Statistics for the Season -", formatted_date, "")) ) %>% gt::tab_source_note(source_note = paste(format(Sys.Date(), format="%B %d, %Y"), "| data: baseballr")) %>% cols_label( Stat = "Statistic", !!sym(team1) := column_names[[1]] %||% team1, !!sym(team2) := column_names[[2]] %||% team2 )
Conditional formatting for columns that exist
if ("AVG" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "AVG", decimals = 3, drop_trailing_zeros = TRUE ) } if ("OPS" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "OPS", decimals = 3, drop_trailing_zeros = TRUE ) } if ("R" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "R", decimals = 0 ) } if ("HR" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "HR", decimals = 0 ) } if ("SB" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "SB", decimals = 0 ) } if ("ERA" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "ERA", decimals = 2 ) } if ("WHIP" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "WHIP", decimals = 2 ) } if ("BB/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "BB/9", decimals = 2 ) } if ("K/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "K/9", decimals = 2 ) } if ("H/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "H/9", decimals = 2 ) } if ("HR/9" %in% fivetwo$Stat) { table <- table |> fmt_number( columns = c(team1, team2), rows = Stat == "HR/9", decimals = 2 ) }
table <- table |> tab_spanner( label = "Teams", columns = c(team1, team2) ) |> tab_style( style = cell_text(weight = "bold"), locations = cells_column_labels(everything()) ) |> tab_style( style = cell_borders( sides = "all", color = "grey", weight = px(1) ), locations = cells_body() ) |> tab_options( table.font.size = px(12), table.width = pct(80), column_labels.font.weight = "bold", data_row.padding = px(5) )
return(table) }
Custom column namescolumn_names <- list("New York Yankees", "Baltimore Orioles")
Call the functiontable <- rank("NYY", "BAL", column_names, AVG, OPS, R, HR, SB, ERA_p, WHIP_p, BB9_p, K_9_p, H_9_p, HR_9_p)
Print the tabletable
gtsave(table, filename = "mlb_game_preview.png")
gtsave( "mlb_game_preview.png", table, device = "png", width = 1600, # Adjust width as needed in pixels height = 1200 # Adjust height as needed in pixels )
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2176400568, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBKI7XA5OPHO3YGO6B3ZIBHUBAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZWGQYDANJWHA . You are receiving this because you commented.Message ID: @.***>
The orioles ERA is accurate, yankees is 4.58 which looks to be Arizona.
WHIP same issue
Oh boy. Alright that probably has something to do with how I joined the teams. I will try and work on that today.
On Tue, Jun 18, 2024 at 9:44 AM jacobakaye @.***> wrote:
WHIP same issue
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2176420668, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBP66DBAHHBINB2IBQ3ZIBIW5AVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZWGQZDANRWHA . You are receiving this because you commented.Message ID: @.***>
Should be fixed. It was an issue with how I combined team names ("Detroit Tigers") with team codes ("DET"). Can you give it a try now.
The whole script is under "collab.R"
On Tue, Jun 18, 2024 at 9:56 AM Kyle Dunlap @.***> wrote:
Oh boy. Alright that probably has something to do with how I joined the teams. I will try and work on that today.
On Tue, Jun 18, 2024 at 9:44 AM jacobakaye @.***> wrote:
WHIP same issue
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2176420668, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBP66DBAHHBINB2IBQ3ZIBIW5AVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZWGQZDANRWHA . You are receiving this because you commented.Message ID: @.***>
Hi. I just checked it out and now see NA values in there for some
Ok interesting. Wonder why that is. I'll take another look today.
On Wed, Jun 19, 2024 at 9:02 AM jacobakaye @.***> wrote:
Hi. I just checked it out and now see NA values in there for some
mlb_game_preview.png (view on web) https://github.com/kndunlap/baseball-web-scrape/assets/126515518/7db91595-2a7e-44a8-94a4-891236c42a15
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2178925926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBMXXZSAUQ4R5AS3EO3ZIGMQZAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZYHEZDKOJSGY . You are receiving this because you commented.Message ID: @.***>
I got the error, but then I ran everything again and it worked. The reason was that the bbref url was importing the wrong stats, but it seems to have fixed itself. Can you give it another try?
On Wed, Jun 19, 2024 at 9:11 AM Kyle Dunlap @.***> wrote:
Ok interesting. Wonder why that is. I'll take another look today.
On Wed, Jun 19, 2024 at 9:02 AM jacobakaye @.***> wrote:
Hi. I just checked it out and now see NA values in there for some
mlb_game_preview.png (view on web) https://github.com/kndunlap/baseball-web-scrape/assets/126515518/7db91595-2a7e-44a8-94a4-891236c42a15
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2178925926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBMXXZSAUQ4R5AS3EO3ZIGMQZAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZYHEZDKOJSGY . You are receiving this because you commented.Message ID: @.***>
Yes it seemed to work. I just would love to fix the cropping to make it more compact
In tab.options() change the table.width to pct(40). That should work.
On Fri, Jun 21, 2024 at 12:16 PM jacobakaye @.***> wrote:
Yes it seemed to work. I just would love to fix the cropping to make it more compact
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2183221125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBNI3LNZM3LXE2RJDGLZIRUWVAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBTGIZDCMJSGU . You are receiving this because you commented.Message ID: @.***>
changes in collab.R now
On Fri, Jun 21, 2024 at 12:43 PM Kyle Dunlap @.***> wrote:
In tab.options() change the table.width to pct(40). That should work.
On Fri, Jun 21, 2024 at 12:16 PM jacobakaye @.***> wrote:
Yes it seemed to work. I just would love to fix the cropping to make it more compact
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2183221125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBNI3LNZM3LXE2RJDGLZIRUWVAVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBTGIZDCMJSGU . You are receiving this because you commented.Message ID: @.***>
Awesome. Thank you. I don't know how this would be possible but adding team logos in there would be sick. Obviously using mlbplotR, but assuming this would be complicated.
Team logos where? You have used mlbplotR in the past to great success so I bet you could figure it out.
On Sat, Jun 22, 2024 at 1:07 PM jacobakaye @.***> wrote:
Awesome. Thank you. I don't know how this would be possible but adding team logos in there would be sick. Obviously using mlbplotR, but assuming this would be complicated.
— Reply to this email directly, view it on GitHub https://github.com/kndunlap/baseball-web-scrape/issues/1#issuecomment-2184158288, or unsubscribe https://github.com/notifications/unsubscribe-auth/AORVLBN7U5EUZ22NJMMTR5TZIXDQ3AVCNFSM6AAAAABD2QRC3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBUGE2TQMRYHA . You are receiving this because you commented.Message ID: @.***>
what do I need to download to do this. i tried and it didn't work