Closed davescroggs closed 2 years ago
thanks @davescroggs !! i'm finding some other errors too now that i'm starting to analyse the data rather than make tibbles with it... 🚨
g'day @davescroggs - could you help me with this? i'm trying to figure out how to isolate the records that are erroneous. how would you suggest identifying xy locations that fall within the 3pt shot arc?
Hey @jacquietran I found it by calculating the distance from the centre of the ring and finding any shots that were less than 6.75m and tagged as 3pt. I also plotted them to check they weren't corner 3's too, and from memory none of them were. Code for shot distance below.
shots %>% mutate(x = x / 100 28, y = y / 100 15, shot_dist = sqrt((x - 1.6)^2 + (y - 7.5)^2 ))
Kia ora @davescroggs - ahh I see! Thanks for the help! I'll tidy this up today 😸
Taking a closer look using data from games played before today 2021-12-19 (i.e., including games from the 2021 season that is in progress).
# Load libraries
library(tidyverse)
library(ggplot2)
library(ggforce)
#
shots %>%
mutate(
x_for_halfcourt_plot = if_else(x > 50, 100 - x, x),
x_adj_halfcourt = x_for_halfcourt_plot / 100 * 28,
y_adj = y / 100 * 15,
shot_dist = sqrt((x_adj_halfcourt - 1.6)^2 + (y_adj - 7.5)^2 )) %>%
filter(action_type == "3pt") %>%
filter(shot_dist < 6.75) %>%
mutate(
# Categorisation allowing for small degree of human error when coding 3-pt attempt locations
shot_category = case_when(
# Recorded location is erroneous when shot distance < 6 m from ring
shot_dist < 6 ~ "erroneous",
# Recorded location is accepted when shot distance >= 6 m from ring
shot_dist >= 6 ~ "accepted")) %>%
arrange(
desc(shot_dist)
) -> short_3pt_shots
# Set court features
add_forward_goal_circle <- function() geom_arc_bar(
aes(x0 = 50, y0 = 200, r0 = 0, r = 4.9/0.1525,
start = pi / 2, end = 3 / 2 * pi),
inherit.aes = FALSE)
add_key <- function() geom_rect(
xmin = 0, xmax = 5.8, ymin = 9.95, ymax = 5.05, col = "black", fill = NA)
add_3pt_arc <- function() geom_arc(
aes(x0 = 1.575, y0 = 7.5, r = 6.75,
start = pi - 0.1862454, end = 0.1862454),
col = "black", inherit.aes = FALSE)
add_halfcourt <- function() geom_rect(
xmin = 0, xmax = 14, ymin = 0, ymax = 15, col = "black",
fill = NA, inherit.aes = FALSE)
add_top_key <- function() geom_arc(
aes(x0 = 5.8, y0 = 7.5, r = 1.8, start = pi, end = 0),
col = "black", inherit.aes = FALSE)
add_backboard <- function() geom_segment(
aes(x = 1.2, xend = 1.2, y = 6.6, yend = 8.4), col = "black")
add_basket <- function() geom_circle(
aes(x0 = 1.675, y0 = 7.5, r = 0.45/2), col = "black", fill = NA,
inherit.aes = FALSE)
add_centre_circle <- function() geom_arc(
aes(x0 = 14, y0 = 7.5, r = 1.8, start = pi, end = 2*pi),
col = "black", inherit.aes = FALSE)
add_3ball_segment1 <- function() geom_segment(
aes(x = 0, xend = 2.825, y = 0.9, yend = 0.9), inherit.aes = FALSE)
add_3ball_segment2 <- function() geom_segment(
aes(x = 0, xend = 2.825, y = 14.1, yend = 14.1), inherit.aes = FALSE)
# Plot
p <- ggplot(
short_3pt_shots, aes(x = x_adj_halfcourt, y = y_adj))
p <- p + add_key()
p <- p + add_top_key()
p <- p + add_halfcourt()
p <- p + add_backboard()
p <- p + add_basket()
p <- p + add_3ball_segment1()
p <- p + add_3ball_segment2()
p <- p + add_3pt_arc()
p <- p + add_centre_circle()
p <- p + geom_point(
aes(colour = shot_category),
stat = "identity", size = 3, alpha = 0.5)
p <- p + scale_colour_manual(
values = c(
"accepted" = "dodgerblue",
"erroneous" = "red"))
p <- p + coord_fixed(xlim = c(0,14), ylim = c(0,15))
p <- p + facet_wrap(~season, nrow = 2)
p <- p + theme_void()
action_type == "3pt"
)"2pt"
(and retaining the XY location as recorded)Looking only at the 3-pt shot locations I have categorised as "erroneous" (allowing for some human error in specifying XY location)
short_3pt_shots_erroneous <- short_3pt_shots %>%
filter(shot_category == "erroneous")
# Plot
p <- ggplot(
short_3pt_shots_erroneous, aes(x = x_adj_halfcourt, y = y_adj))
p <- p + add_key()
p <- p + add_top_key()
p <- p + add_halfcourt()
p <- p + add_backboard()
p <- p + add_basket()
p <- p + add_3ball_segment1()
p <- p + add_3ball_segment2()
p <- p + add_3pt_arc()
p <- p + add_centre_circle()
p <- p + geom_point(
colour = "red", stat = "identity", size = 3, alpha = 0.5)
p <- p + coord_fixed(xlim = c(0,14), ylim = c(0,15))
p <- p + facet_wrap(~season, nrow = 2)
p <- p + theme_void()
short_3pt_shots_erroneous %>%
select(
season, page_id, action_number, team_name, team_name_opp,
shot_result, x, y, x_adj_halfcourt, y_adj, shot_dist, scoreboard_name) %>%
arrange(desc(season), desc(shot_dist)) %>%
gt::gt()
data tidying to do for all of the above:
shots
and pbp
data, update shot type to "2pt"shots
, keep XY location as isbox_scores
and box_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %Note: In my checking process, I looked through the pbp
data as well and there may be some issues potentially with the action numbers - they might not always be in chronological order??? Will investigate further and open a separate issue about this if needed.
Couldn't find video of the full game (maybe accessible via Kayo as VOD, but I don't have a subscription). Videos online (on Youtube and Facebook) are highlight reels only and do not include this particular shot, taken at the end of the 1st quarter.
shots
and pbp
data, update shot type to "2pt"shots
, keep XY location as isbox_scores
and box_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %Couldn't find video of the full game (maybe accessible via Kayo as VOD, but I don't have a subscription).
shots
and pbp
data, update shot type to "2pt"shots
, keep XY location as isbox_scores
and box_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %Couldn't find video of the full game.
There are a set of distances recorded with action_type == "3pt"
but are quite close to the ring. All of these shots have an XY location that is < 2 m from the ring:
For each of the above, data tidying to do:
shots
and pbp
data, update shot type to "2pt"shots
, keep XY location as isbox_scores
and box_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %These shots have an XY location that is < 2 m from the ring:
For each of the above, data tidying to do:
shots
and pbp
data, update shot type to "2pt"shots
, keep XY location as isbox_scores
and box_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %These shots have an XY location that is inside the key:
For each of the above, data tidying to do:
shots
and pbp
data, update shot type to "2pt"shots
, keep XY location as isbox_scores
and box_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %summarising the checks above:
or simpler, focusing on shots to be updated only:
short_3pt_shots_erroneous %>%
select(
season, page_id, action_number, team_name, team_name_opp, shot_dist,
scoreboard_name) %>%
arrange(desc(season), desc(shot_dist)) %>%
mutate(
to_be_updated = case_when(
page_id %in% c(
1997422,
1330488,
1087577,
681947,
681956,
681960,
681968,
681977,
137261,
137309,
64580,
64597) ~ TRUE,
page_id == 681958 &
action_number == 107 ~ TRUE,
page_id == 681958 &
action_number == 188 ~ TRUE,
page_id == 681958 &
action_number == 319 ~ TRUE,
page_id == 681974 &
action_number == 306 ~ TRUE,
page_id == 681974 &
action_number == 359 ~ TRUE,
TRUE ~ FALSE)) %>%
filter(to_be_updated == TRUE) %>%
gt::gt()
For each of the shots listed in the screenshot immediately above, data tidying to do:
^ oops, that last commit message was meant to read "in 2017 games"
This will be ready to close when the beta branch of {wnblr} is merged.
More just for awareness, but some (~20-30) of the records in the "shots" data frame have an action_type of "3pt" when their location puts them well inside the 3pt-line. All instances are missed shots so likely operator error. Just an FYI.