jimmyday12 / fitzRoy

A set of functions to easily access AFL data
https://jimmyday12.github.io/fitzRoy
Other
129 stars 27 forks source link

Incorrect data parsing with fetch_results_afltables() #161

Closed mprofumo closed 2 years ago

mprofumo commented 3 years ago

It looks like there is an issue parsing the raw data from AFL tables with scores ending up in team names, e.g Away.Team being "3.7.85".

Thanks for the hard work!



devtools::install_github("jimmyday12/fitzRoy")
results <- fetch_results_afltables(2021)
unique(results$Away.Team)

#> Skipping install of 'fitzRoy' from a github remote, the SHA1 (a1ac26a9) has not changed since last install.
   Use `force = TRUE` to force installation
#> 
#> Warning message:
#> “13286 parsing failures.
#> row col  expected     actual                                               file
#>   1  -- 8 columns 9 columns  'https://afltables.com/afl/stats/biglists/bg3.txt'
#>   2  -- 8 columns 10 columns 'https://afltables.com/afl/stats/biglists/bg3.txt'
#>   3  -- 8 columns 9 columns  'https://afltables.com/afl/stats/biglists/bg3.txt'
#>   4  -- 8 columns 10 columns 'https://afltables.com/afl/stats/biglists/bg3.txt'
#>   5  -- 8 columns 10 columns 'https://afltables.com/afl/stats/biglists/bg3.txt'
#> ... ... ......... .......... ..................................................
#> See problems(...) for more details.
#> ”
#> Warning message:
#> “Expected 3 pieces. Missing pieces filled with `NA` in 4514 rows [4, 5, 7, 12, 14, 19, 21, 23, 28, 30, 40, 44, 48, 55, 66, 69, 73, 76, #>82, 84, ...].”
#> Warning message:
#>  “Expected 3 pieces. Missing pieces filled with `NA` in 7912 rows [2, 4, 5, 7, 12, 14, 16, 19, 20, 21, 23, 27, 28, 30, 32, 33, 35, 40, 42, 44, ...].”
#> [1] "Unique team names:"
#> 'Carlton''Western''Fremantle''Geelong''Hawthorn''14.10.94''9.11.65''11.12.78''12.11.83''Collingwood''Brisbane Lions''Adelaide''18.11.119''11.7.73''14.14.98''Richmond''14.16.100''GW''5.9.39''Gold''Sydney''St''16.12.108''11.2.68''Essendon''11.13.79''10.13.73''15.12.102''8.11.59''10.8.68''7.6.48''16.7.103''17.16.118''Port''Melbourne''North''West''15.10.100''14.9.93''19.14.128''13.15.93''20.12.132''7.12.54''16.11.107''11.10.76''12.15.87''14.11.95''5.17.47''7.9.51''12.5.77''16.10.106''21.18.144''19.15.129''12.16.88''11.5.71''17.11.113''18.7.115''14.7.91''8.12.60''13.7.85''4.7.31''6.9.45''16.6.102''13.16.94''9.18.72''12.9.81''11.6.72''6.7.43''10.17.77''8.7.55''8.15.63''9.10.64''8.13.61''11.8.74''14.13.97''17.18.120''15.15.105''6.6.42''4.6.30''12.12.84''9.9.63''21.14.140''22.10.142''10.4.64''19.11.125''17.5.107''12.14.86'
zeldir commented 3 years ago

Yeah I get this too. I think it has something to do with the venues having inconsistent whitespace. I wrote some Python code that sucks it out correctly.

mprofumo commented 3 years ago

I also found a temporary solution by commenting out the assigned column types in readr::read_fwf()

match_data <- readr::read_fwf(url_text, 
                skip = 2, 
                col_positions = cols, 
                # col_types = c("dcccccccc")
                )
jimmyday12 commented 2 years ago

Hi @mprofumo I'm not able to replicate this locally.

library(fitzRoy)
results <- fetch_results_afltables(2021)
unique(results$Away.Team)
#>  [1] "Carlton"         "Footscray"       "Fremantle"       "Geelong"        
#>  [5] "Hawthorn"        "Sydney"          "Port Adelaide"   "St Kilda"       
#>  [9] "Gold Coast"      "Collingwood"     "Brisbane Lions"  "Adelaide"       
#> [13] "Essendon"        "Melbourne"       "North Melbourne" "Richmond"       
#> [17] "West Coast"      "GWS"

Created on 2021-10-14 by the reprex package (v2.0.0)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.3 (2020-10-10) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Melbourne #> date 2021-10-14 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2) #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2) #> cli 3.0.1 2021-07-17 [1] CRAN (R 4.0.2) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2) #> curl 4.3.2 2021-06-23 [1] CRAN (R 4.0.2) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2) #> dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.0.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.2) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.0.2) #> fitzRoy * 1.0.0.9000 2021-10-14 [1] Github (jimmyday12/fitzRoy@1b0e390) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.2) #> hms 1.1.0 2021-05-17 [1] CRAN (R 4.0.2) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2) #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.2) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.3) #> lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.0.2) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2) #> pillar 1.6.1 2021-05-16 [1] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.2) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2) #> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2) #> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.0.2) #> readr 2.0.0 2021-07-20 [1] CRAN (R 4.0.2) #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.2) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2) #> rmarkdown 2.9 2021-06-15 [1] CRAN (R 4.0.2) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) #> stringi 1.7.3 2021-07-16 [1] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.2) #> tibble 3.1.3 2021-07-23 [1] CRAN (R 4.0.2) #> tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.0.2) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.2) #> tzdb 0.1.2 2021-07-20 [1] CRAN (R 4.0.2) #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.3) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.2) #> vroom 1.5.3 2021-07-14 [1] CRAN (R 4.0.2) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.2) #> xfun 0.24 2021-06-15 [1] CRAN (R 4.0.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) #> #> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ```

Are you able to confirm that this is still an issue for you? If so, I might need you to produce another reprex with your session info (you can do this with reprex::reprex(session_info = TRUE) after you've copied your code).

jimmyday12 commented 2 years ago

Hi @mprofumo @zeldir just reviewing some old bugs and noticed I can't reproduce this. If you could confirm it is still an issue (ideally with a reproducable example) that'd be awesome.

It may have been fixed by some related work

zeldir commented 2 years ago

Hi Jimmy,

I have updated everything and now using new commands, and this works well:

res = fetch_results(2010:2021) res <- apply(res,2,as.character) write.csv(res, "results_afl.csv")

I think I'll move to the AFL data because it's all the same as the fixture.

Thanks! Lach

On Wed, 12 Jan 2022 at 11:51, James Day @.***> wrote:

Hi @mprofumo https://github.com/mprofumo @zeldir https://github.com/zeldir just reviewing some old bugs and noticed I can't reproduce this. If you could confirm it is still an issue (ideally with a reproducable example) that'd be awesome.

It may have been fixed by some related work

— Reply to this email directly, view it on GitHub https://github.com/jimmyday12/fitzRoy/issues/161#issuecomment-1010437920, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQTNZTQRHFJP2XJDDQ4FDL3UVSX5XANCNFSM5DWFJ4PA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>