jimmyday12 / fitzRoy

A set of functions to easily access AFL data
https://jimmyday12.github.io/fitzRoy
Other
126 stars 27 forks source link

fetch_player_stats_footywire does not return complete data #144

Closed hamgamb closed 3 years ago

hamgamb commented 3 years ago
library(fitzRoy)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
fetch_player_stats_footywire(season = 2020) %>% 
    filter(Round == "Grand Final")
#> i Getting match IDs
#> v Getting match IDs ... done
#> i Checking data on <https://github.com/jimmyday12/fitzRoy>v Checking data on <https://github.com/jimmyday12/fitzRoy> ... done
#> i No new matches found - returning data cached on github
#> # A tibble: 4 x 45
#>   Date       Season Round  Venue Player   Team  Opposition Status Match_id    CP
#>   <date>      <dbl> <chr>  <chr> <chr>    <chr> <chr>      <chr>     <dbl> <int>
#> 1 2020-10-24   2020 Grand~ Gabba Dion Pr~ Rich~ Geelong    Home      10326     7
#> 2 2020-10-24   2020 Grand~ Gabba David A~ Rich~ Geelong    Home      10326     2
#> 3 2020-10-24   2020 Grand~ Gabba Gryan M~ Geel~ Richmond   Away      10326     4
#> 4 2020-10-24   2020 Grand~ Gabba Brandan~ Geel~ Richmond   Away      10326     4
#> # ... with 35 more variables: UP <int>, ED <int>, DE <dbl>, CM <int>, GA <int>,
#> #   MI5 <int>, One.Percenters <int>, BO <int>, TOG <int>, K <int>, HB <int>,
#> #   D <int>, M <int>, G <int>, B <int>, T <int>, HO <int>, GA1 <int>,
#> #   I50 <int>, CL <int>, CG <int>, R50 <int>, FF <int>, FA <int>, AF <int>,
#> #   SC <int>, CCL <int>, SCL <int>, SI <int>, MG <int>, TO <int>, ITC <int>,
#> #   T5 <int>, GA...15 <int>, GA...35 <int>

Created on 2021-03-04 by the reprex package (v1.0.0)

Compared to data for the same game from fryzigg

library(fitzRoy)
library(dplyr)

#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

fitzRoy::fetch_player_stats_fryzigg(season = 2020) %>%
 filter(date == "2020-10-24")
#> i Returning cached AFLM data from 2020v Returning cached AFLM data from 2020 ... done
#> # A tibble: 44 x 81
#>    venue_name match_id match_home_team match_away_team match_date
#>    <chr>         <int> <chr>           <chr>           <chr>     
#>  1 Gabba         15874 Richmond        Geelong         2020-10-24
#>  2 Gabba         15874 Richmond        Geelong         2020-10-24
#>  3 Gabba         15874 Richmond        Geelong         2020-10-24
#>  4 Gabba         15874 Richmond        Geelong         2020-10-24
#>  5 Gabba         15874 Richmond        Geelong         2020-10-24
#>  6 Gabba         15874 Richmond        Geelong         2020-10-24
#>  7 Gabba         15874 Richmond        Geelong         2020-10-24
#>  8 Gabba         15874 Richmond        Geelong         2020-10-24
#>  9 Gabba         15874 Richmond        Geelong         2020-10-24
#> 10 Gabba         15874 Richmond        Geelong         2020-10-24
#> # ... with 34 more rows, and 76 more variables: match_local_time <chr>,
#> #   match_attendance <int>, match_round <chr>, match_home_team_goals <int>,
#> #   match_home_team_behinds <int>, match_home_team_score <int>,
#> #   match_away_team_goals <int>, match_away_team_behinds <int>,
#> #   match_away_team_score <int>, match_margin <int>, match_winner <chr>,
#> #   match_weather_temp_c <int>, match_weather_type <chr>, player_id <int>,
#> #   player_first_name <chr>, player_last_name <chr>, player_height_cm <int>,
#> #   player_weight_kg <int>, player_is_retired <lgl>, player_team <chr>,
#> #   guernsey_number <int>, kicks <int>, marks <int>, handballs <int>,
#> #   disposals <int>, effective_disposals <int>,
#> #   disposal_efficiency_percentage <int>, goals <int>, behinds <int>,
#> #   hitouts <int>, tackles <int>, rebounds <int>, inside_fifties <int>,
#> #   clearances <int>, clangers <int>, free_kicks_for <int>,
#> #   free_kicks_against <int>, brownlow_votes <int>,
#> #   contested_possessions <int>, uncontested_possessions <int>,
#> #   contested_marks <int>, marks_inside_fifty <int>, one_percenters <int>,
#> #   bounces <int>, goal_assists <int>, time_on_ground_percentage <int>,
#> #   afl_fantasy_score <int>, supercoach_score <int>, centre_clearances <int>,
#> #   stoppage_clearances <int>, score_involvements <int>, metres_gained <int>,
#> #   turnovers <int>, intercepts <int>, tackles_inside_fifty <int>,
#> #   contest_def_losses <int>, contest_def_one_on_ones <int>,
#> #   contest_off_one_on_ones <int>, contest_off_wins <int>,
#> #   def_half_pressure_acts <int>, effective_kicks <int>,
#> #   f50_ground_ball_gets <int>, ground_ball_gets <int>,
#> #   hitouts_to_advantage <int>, hitout_win_percentage <dbl>,
#> #   intercept_marks <int>, marks_on_lead <int>, pressure_acts <int>,
#> #   rating_points <dbl>, ruck_contests <int>, score_launches <int>,
#> #   shots_at_goal <int>, spoils <int>, subbed <chr>, player_position <chr>,
#> #   date <date>

Created on 2021-03-04 by the reprex package (v1.0.0)

jimmyday12 commented 3 years ago

Thanks for the issue and the reprex!

Can I ask what is it that you are expecting here and why you think the Footywire data is incomplete? The data here is coming different data sources so they won't match. Fryzigg data is always different to Footywire data.

Is there something in the Footywire data that is specifically missing or more just that you expected them to return the same data?

hamgamb commented 3 years ago

I would have expected that player stats data should have data for every player within a game. Sorry that I didn't make that clear - my reprex intended to show the differences in the number of rows returned, rather than the columns.

The footywire player stats for the grand final last year only has data for 4 players - compared to the 44 players in the fryzigg data.

jimmyday12 commented 3 years ago

Ahhhhh nice. Sorry I misread that as 44! I'll have to take a look at what is going on there.

patrickhalberstram commented 3 years ago

As far as i can tell @jimmyday12 , _dat_url2 <- "https://github.com/jimmyday12/fitzroy_data/raw/master/data-raw/player_stats/player_stats.rda"_ in your _update_footywirestats function is the cause of the issues, as the rda file only has ~8k rows but 2199 unique match IDs.

jimmyday12 commented 3 years ago

Hmmm there is a bug that I introduced when I renamed the master branch to main.

I wonder if installing the development version fixes this.

https://github.com/jimmyday12/fitzRoy/issues/142

hamgamb commented 3 years ago

I think I was using the development version already, but just re-installed to double check. Same deal.

library(fitzRoy)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

footywire_rows <- fetch_player_stats_footywire(season = 2020) %>%
  filter(Round == "Grand Final") %>%
  nrow()
#> i Getting match IDs
#> v Getting match IDs ... done
#> i Checking data on <https://github.com/jimmyday12/fitzRoy>v Checking data on <https://github.com/jimmyday12/fitzRoy> ... done
#> i No new matches found - returning data cached on github

footywire_rows
#> [1] 4

packageVersion("fitzRoy")
#> [1] '0.3.3.9000'

Created on 2021-03-10 by the reprex package (v1.0.0)

jimmyday12 commented 3 years ago

Damn. I will try take a look at it this weekend. Planning to do a CRAN release at some point over the weekend so hopefully can fix this up.

It actually looks like a data issue rather than the package itself (I keep a separate repo over at https://github.com/jimmyday12/fitzroy_data to house some historical data). As @patrickhalberstram points out - the data in that repo looks to be wrong

jimmyday12 commented 3 years ago

I think this should be fixed now - @hamgamb if you could check that would be great!