jimmyday12 / fitzRoy

A set of functions to easily access AFL data
https://jimmyday12.github.io/fitzRoy
Other
129 stars 28 forks source link

3 games missing from Round 20 2022 using fetch_player_stats_afltables #181

Closed insightlane closed 2 years ago

insightlane commented 2 years ago

Please briefly describe your problem and what output you expect.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


I noticed some 2022 player goal tallies were off. Turns out R20 2022 is missing three games of data.


library(tibble)
library(dplyr)
library(fitzRoy)

afltables_data <- fetch_player_stats_afltables(season = c(1897:2022))

afltables_data %>%
  filter(Season == 2022) %>%
  mutate(Round = as.integer((Round))) %>%
  group_by(Season, Round) %>%
  summarise(games = n_distinct(Home.team)) %>%
  arrange(Round) %>%
  print(n = Inf)

   Season Round games
    <dbl> <int> <int>
 1   2022     1     9
 2   2022     2     9
 3   2022     3     9
 4   2022     4     9
 5   2022     5     9
 6   2022     6     9
 7   2022     7     9
 8   2022     8     9
 9   2022     9     9
10   2022    10     9
11   2022    11     9
12   2022    12     6
13   2022    13     6
14   2022    14     6
15   2022    15     9
16   2022    16     9
17   2022    17     9
18   2022    18     9
19   2022    19     9
20   2022    20     6
21   2022    21     9
22   2022    22     9
jimmyday12 commented 2 years ago

Strange - thanks for picking it up mate. I'll take a look this week

jimmyday12 commented 2 years ago

@insightlane just realised that a new argument called rescrape = TRUE that I added last week should help here in the short term.

You'll need to install the dev version via github

library(tibble)
library(dplyr)
library(fitzRoy)

afltables_data <- fetch_player_stats_afltables(season = c(1897:2022), rescrape = TRUE, rescrape_start_season = 2022)

afltables_data %>%
  filter(Season == 2022) %>%
  mutate(Round = as.integer((Round))) %>%
  group_by(Season, Round) %>%
  summarise(games = n_distinct(Home.team)) %>%
  arrange(Round) %>%
  print(n = Inf)

#> # A tibble: 22 × 3
#> # Groups:   Season [1]
#>    Season Round games
#>     <dbl> <int> <int>
#>  1   2022     1     9
#>  2   2022     2     9
#>  3   2022     3     9
#>  4   2022     4     9
#>  5   2022     5     9
#>  6   2022     6     9
#>  7   2022     7     9
#>  8   2022     8     9
#>  9   2022     9     9
#> 10   2022    10     9
#> 11   2022    11     9
#> 12   2022    12     6
#> 13   2022    13     6
#> 14   2022    14     6
#> 15   2022    15     9
#> 16   2022    16     9
#> 17   2022    17     9
#> 18   2022    18     9
#> 19   2022    19     9
#> 20   2022    20     9
#> 21   2022    21     9
#> 22   2022    22     9

I'll have to update the cached data eventually so you wouldn't need to rescrape but if it helps for now that could be an option