jacquietran / wnblr

An R package containing game stats from the Women's National Basketball League (WNBL).
Other
10 stars 0 forks source link

Erroneous action types for 3pt shots #29

Closed davescroggs closed 2 years ago

davescroggs commented 2 years ago

More just for awareness, but some (~20-30) of the records in the "shots" data frame have an action_type of "3pt" when their location puts them well inside the 3pt-line. All instances are missed shots so likely operator error. Just an FYI.

jacquietran commented 2 years ago

thanks @davescroggs !! i'm finding some other errors too now that i'm starting to analyse the data rather than make tibbles with it... 🚨

jacquietran commented 2 years ago

g'day @davescroggs - could you help me with this? i'm trying to figure out how to isolate the records that are erroneous. how would you suggest identifying xy locations that fall within the 3pt shot arc?

davescroggs commented 2 years ago

Hey @jacquietran I found it by calculating the distance from the centre of the ring and finding any shots that were less than 6.75m and tagged as 3pt. I also plotted them to check they weren't corner 3's too, and from memory none of them were. Code for shot distance below.

shots %>% mutate(x = x / 100 28, y = y / 100 15, shot_dist = sqrt((x - 1.6)^2 + (y - 7.5)^2 ))

jacquietran commented 2 years ago

Kia ora @davescroggs - ahh I see! Thanks for the help! I'll tidy this up today 😸

jacquietran commented 2 years ago

Taking a closer look using data from games played before today 2021-12-19 (i.e., including games from the 2021 season that is in progress).


# Load libraries
library(tidyverse)
library(ggplot2)
library(ggforce)

# 
shots %>%
    mutate(
        x_for_halfcourt_plot = if_else(x > 50, 100 - x, x),
        x_adj_halfcourt = x_for_halfcourt_plot / 100 * 28,
                y_adj = y / 100 * 15,
        shot_dist = sqrt((x_adj_halfcourt - 1.6)^2 + (y_adj - 7.5)^2 )) %>%
    filter(action_type == "3pt") %>%
    filter(shot_dist < 6.75) %>%
    mutate(
                # Categorisation allowing for small degree of human error when coding 3-pt attempt locations
        shot_category = case_when(
                        # Recorded location is erroneous when shot distance < 6 m from ring
            shot_dist < 6   ~ "erroneous",
                        # Recorded location is accepted when shot distance >= 6 m from ring
            shot_dist >= 6 ~ "accepted")) %>%
    arrange(
        desc(shot_dist)
    ) -> short_3pt_shots

# Set court features
add_forward_goal_circle <- function() geom_arc_bar(
    aes(x0 = 50, y0 = 200, r0 = 0, r = 4.9/0.1525,
            start = pi / 2, end = 3 / 2 * pi),
    inherit.aes = FALSE)
add_key <- function() geom_rect(
    xmin = 0, xmax = 5.8, ymin = 9.95, ymax = 5.05, col = "black", fill = NA)
add_3pt_arc <- function() geom_arc(
    aes(x0 = 1.575, y0 = 7.5, r = 6.75,
            start = pi - 0.1862454, end = 0.1862454),
    col = "black", inherit.aes = FALSE)
add_halfcourt <- function() geom_rect(
    xmin = 0, xmax = 14, ymin = 0, ymax = 15, col = "black",
    fill = NA, inherit.aes = FALSE)
add_top_key <- function() geom_arc(
    aes(x0 = 5.8, y0 = 7.5, r = 1.8, start = pi, end = 0),
    col = "black", inherit.aes = FALSE)
add_backboard <- function() geom_segment(
    aes(x = 1.2, xend = 1.2, y = 6.6, yend = 8.4), col = "black")
add_basket <- function() geom_circle(
    aes(x0 = 1.675, y0 = 7.5, r = 0.45/2), col = "black", fill = NA,
    inherit.aes = FALSE)
add_centre_circle <- function() geom_arc(
    aes(x0 = 14, y0 = 7.5, r = 1.8, start = pi, end = 2*pi),
    col = "black", inherit.aes = FALSE)
add_3ball_segment1 <- function() geom_segment(
    aes(x = 0, xend = 2.825, y = 0.9, yend = 0.9), inherit.aes = FALSE)
add_3ball_segment2 <- function() geom_segment(
    aes(x = 0, xend = 2.825, y = 14.1, yend = 14.1), inherit.aes = FALSE)

# Plot
p <- ggplot(
    short_3pt_shots, aes(x = x_adj_halfcourt, y = y_adj))
p <- p + add_key()
p <- p + add_top_key()
p <- p + add_halfcourt()
p <- p + add_backboard()
p <- p + add_basket()
p <- p + add_3ball_segment1()
p <- p + add_3ball_segment2()
p <- p + add_3pt_arc()
p <- p + add_centre_circle()
p <- p + geom_point(
    aes(colour = shot_category),
    stat = "identity", size = 3, alpha = 0.5)
p <- p + scale_colour_manual(
    values = c(
        "accepted" = "dodgerblue",
        "erroneous" = "red"))
p <- p + coord_fixed(xlim = c(0,14), ylim = c(0,15))
p <- p + facet_wrap(~season, nrow = 2)
p <- p + theme_void()

image

jacquietran commented 2 years ago

Observations

jacquietran commented 2 years ago

Looking only at the 3-pt shot locations I have categorised as "erroneous" (allowing for some human error in specifying XY location)


short_3pt_shots_erroneous <- short_3pt_shots %>%
    filter(shot_category == "erroneous")

# Plot
p <- ggplot(
    short_3pt_shots_erroneous, aes(x = x_adj_halfcourt, y = y_adj))
p <- p + add_key()
p <- p + add_top_key()
p <- p + add_halfcourt()
p <- p + add_backboard()
p <- p + add_basket()
p <- p + add_3ball_segment1()
p <- p + add_3ball_segment2()
p <- p + add_3pt_arc()
p <- p + add_centre_circle()
p <- p + geom_point(
    colour = "red", stat = "identity", size = 3, alpha = 0.5)
p <- p + coord_fixed(xlim = c(0,14), ylim = c(0,15))
p <- p + facet_wrap(~season, nrow = 2)
p <- p + theme_void()

image


short_3pt_shots_erroneous %>%
    select(
        season, page_id, action_number, team_name, team_name_opp,
        shot_result, x, y, x_adj_halfcourt, y_adj, shot_dist, scoreboard_name) %>%
    arrange(desc(season), desc(shot_dist)) %>%
    gt::gt()

image

jacquietran commented 2 years ago

WNBL: 2021 / 2022

page_id = 2061074, action_number = 631

page_id = 2061074, action_number = 642

page_id = 1997422

data tidying to do for all of the above:

Note: In my checking process, I looked through the pbp data as well and there may be some issues potentially with the action numbers - they might not always be in chronological order??? Will investigate further and open a separate issue about this if needed.

jacquietran commented 2 years ago

2020

page_id = 1777499

Couldn't find video of the full game (maybe accessible via Kayo as VOD, but I don't have a subscription). Videos online (on Youtube and Facebook) are highlight reels only and do not include this particular shot, taken at the end of the 1st quarter.

jacquietran commented 2 years ago

2019

page_id = 1330488

Couldn't find video of the full game (maybe accessible via Kayo as VOD, but I don't have a subscription).

jacquietran commented 2 years ago

2018

page_id = 1087577

Couldn't find video of the full game.

jacquietran commented 2 years ago

2017

page_id == 681961

page_id == 681927

Very short shot distances

There are a set of distances recorded with action_type == "3pt" but are quite close to the ring. All of these shots have an XY location that is < 2 m from the ring:

For each of the above, data tidying to do:

jacquietran commented 2 years ago

2015

page_id == 137266

page_id == 137248

Very short shot distances

These shots have an XY location that is < 2 m from the ring:

For each of the above, data tidying to do:

jacquietran commented 2 years ago

2014

page_id = 64536

Short shot distances

These shots have an XY location that is inside the key:

For each of the above, data tidying to do:

jacquietran commented 2 years ago

summarising the checks above:

image

or simpler, focusing on shots to be updated only:


short_3pt_shots_erroneous %>%
    select(
        season, page_id, action_number, team_name, team_name_opp, shot_dist,
        scoreboard_name) %>%
    arrange(desc(season), desc(shot_dist)) %>%
    mutate(
        to_be_updated = case_when(
            page_id %in% c(
                1997422,
                1330488,
                1087577,
                681947,
                681956,
                681960,
                681968,
                681977,
                137261,
                137309,
                64580,
                64597)               ~ TRUE,
            page_id == 681958 &
                action_number == 107 ~ TRUE,
            page_id == 681958 &
                action_number == 188 ~ TRUE,
            page_id == 681958 &
                action_number == 319 ~ TRUE,
            page_id == 681974 &
                action_number == 306 ~ TRUE,
            page_id == 681974 &
                action_number == 359 ~ TRUE,
            TRUE                   ~ FALSE)) %>%
    filter(to_be_updated == TRUE) %>%
    gt::gt()

image

For each of the shots listed in the screenshot immediately above, data tidying to do:

jacquietran commented 2 years ago

^ oops, that last commit message was meant to read "in 2017 games"

jacquietran commented 2 years ago

This will be ready to close when the beta branch of {wnblr} is merged.