robjhyndman / cricketdata

International cricket data for men and women, Tests, ODIs and T20s
http://pkg.robjhyndman.com/cricketdata/
80 stars 21 forks source link

player & match df has match ids in an unusual format #21

Closed Dazzalytics closed 2 years ago

Dazzalytics commented 2 years ago

First up, great work in making this package.

I was working with PSL data and noticed that player & match datasets have match ids as a long string of file paths. E.g. This long string C:\Users\AppData\Local\Temp\Rtmpuu2sV2/psl_male_bbb/1075986 shows up as match ids in data, where 1075986 is the match id.

I wrote this simple function below to fix it for myself. But probably there is a way to fix it in the package code. Thank you!

clean_match_id = function(df){ sapply(strsplit(df$match_id, "/"), '[',3) }

robjhyndman commented 2 years ago

I think this might depend on your operating system. Here's what I get on ubuntu:

library(cricketdata)
fetch_cricsheet(type = "match", gender = "male", competition = "psl")
#> # A tibble: 206 × 25
#>    match_id balls_per_over team1    team2 gender season date  event match_number
#>    <chr>    <chr>          <chr>    <chr> <chr>  <chr>  <chr> <chr> <chr>       
#>  1 1075986  6              Islamab… Pesh… male   2016/… 2017… Paki… 1           
#>  2 1075988  6              Karachi… Pesh… male   2016/… 2017… Paki… 3           
#>  3 1075995  6              Islamab… Kara… male   2016/… 2017… Paki… 10          
#>  4 1075997  6              Islamab… Pesh… male   2016/… 2017… Paki… 12          
#>  5 1076001  6              Lahore … Pesh… male   2016/… 2017… Paki… 16          
#>  6 1076005  6              Islamab… Kara… male   2016/… 2017… Paki… 20          
#>  7 1076007  6              Karachi… Isla… male   2016/… 2017… Paki… <NA>        
#>  8 1076008  6              Peshawa… Kara… male   2016/… 2017… Paki… <NA>        
#>  9 1075994  6              Peshawa… Quet… male   2016/… 2017… Paki… 9           
#> 10 1075990  6              Karachi… Quet… male   2016/… 2017… Paki… 5           
#> # … with 196 more rows, and 16 more variables: venue <chr>, city <chr>,
#> #   toss_winner <chr>, toss_decision <chr>, player_of_match <chr>,
#> #   umpire1 <chr>, umpire2 <chr>, reserve_umpire <chr>, tv_umpire <chr>,
#> #   match_referee <chr>, winner <chr>, winner_wickets <chr>, method <chr>,
#> #   winner_runs <chr>, outcome <chr>, eliminator <chr>
fetch_cricsheet(type = "player", gender = "male", competition = "psl")
#> # A tibble: 4,534 × 3
#>    team             player        match_id
#>    <chr>            <chr>         <chr>   
#>  1 Islamabad United DR Smith      1075986 
#>  2 Islamabad United Sharjeel Khan 1075986 
#>  3 Islamabad United BJ Haddin     1075986 
#>  4 Islamabad United SR Watson     1075986 
#>  5 Islamabad United SW Billings   1075986 
#>  6 Islamabad United Misbah-ul-Haq 1075986 
#>  7 Islamabad United Imran Khalid  1075986 
#>  8 Islamabad United Amad Butt     1075986 
#>  9 Islamabad United Saeed Ajmal   1075986 
#> 10 Islamabad United Mohammad Sami 1075986 
#> # … with 4,524 more rows

Created on 2022-02-20 by the reprex package (v2.0.1)

I'll need to find a Windows computer to test it on.

robjhyndman commented 2 years ago

@jacquietran Are you using Windows? Can you replicate this issue?

jacquietran commented 2 years ago

Hey @robjhyndman and @Dazzalytics !

I can reproduce the issue on Windows:

> library(cricketdata)

> fetch_cricsheet(type = "match", gender = "male", competition = "psl")
# trying URL 'https://cricsheet.org/downloads/psl_male_csv2.zip'
# Content type 'application/zip' length 1002739 bytes (979 KB)
# downloaded 979 KB

# A tibble: 206 x 25                                                                                     
#   match_id     balls_per_over team1  team2  gender season date  event
#   <chr>        <chr>          <chr>  <chr>  <chr>  <chr>  <chr> <chr>
# 1 "C:\\Users\~ 6              Islam~ Pesha~ male   2016/~ 2017~ Paki~
# 2 "C:\\Users\~ 6              Karac~ Pesha~ male   2016/~ 2017~ Paki~
# 3 "C:\\Users\~ 6              Islam~ Karac~ male   2016/~ 2017~ Paki~
# 4 "C:\\Users\~ 6              Islam~ Pesha~ male   2016/~ 2017~ Paki~
# 5 "C:\\Users\~ 6              Lahor~ Pesha~ male   2016/~ 2017~ Paki~
# 6 "C:\\Users\~ 6              Islam~ Karac~ male   2016/~ 2017~ Paki~
# 7 "C:\\Users\~ 6              Karac~ Islam~ male   2016/~ 2017~ Paki~
# 8 "C:\\Users\~ 6              Pesha~ Karac~ male   2016/~ 2017~ Paki~
# 9 "C:\\Users\~ 6              Pesha~ Quett~ male   2016/~ 2017~ Paki~
# 10 "C:\\Users\~ 6              Karac~ Quett~ male   2016/~ 2017~ Paki~
# ... with 196 more rows, and 17 more variables: match_number <chr>,
#   venue <chr>, city <chr>, toss_winner <chr>, toss_decision <chr>,
#   player_of_match <chr>, umpire1 <chr>, umpire2 <chr>,
#   reserve_umpire <chr>, tv_umpire <chr>, match_referee <chr>,
#   winner <chr>, winner_wickets <chr>, method <chr>,
#   winner_runs <chr>, outcome <chr>, eliminator <chr>

> fetch_cricsheet(type = "player", gender = "male", competition = "psl")
# A tibble: 4,534 x 3                                                                                                 
#   team             player        match_id                                         
#   <chr>            <chr>         <chr>                                            
# 1 Islamabad United DR Smith      "C:\\Users\\jacqu\\AppData\\Local\\Temp\\Rtmp29I~
# 2 Islamabad United Sharjeel Khan "C:\\Users\\jacqu\\AppData\\Local\\Temp\\Rtmp29I~
# 3 Islamabad United BJ Haddin     "C:\\Users\\jacqu\\AppData\\Local\\Temp\\Rtmp29I~
# 4 Islamabad United SR Watson     "C:\\Users\\jacqu\\AppData\\Local\\Temp\\Rtmp29I~
# 5 Islamabad United SW Billings   "C:\\Users\\jacqu\\AppData\\Local\\Temp\\Rtmp29I~
# 6 Islamabad United Misbah-ul-Haq "C:\\Users\\jacqu\\AppData\\Local\\Temp\\Rtmp29I~
# 7 Islamabad United Imran Khalid  "C:\\Users\\jacqu\\AppData\\Local\\Temp\\Rtmp29I~
# 8 Islamabad United Amad Butt     "C:\\Users\\jacqu\\AppData\\Local\\Temp\\Rtmp29I~
# 9 Islamabad United Saeed Ajmal   "C:\\Users\\jacqu\\AppData\\Local\\Temp\\Rtmp29I~
# 10 Islamabad United Mohammad Sami "C:\\Users\\jacqu\\AppData\\Local\\Temp\\Rtmp29I~
# ... with 4,524 more rows
robjhyndman commented 2 years ago

It should be fixed now: https://github.com/robjhyndman/cricketdata/commit/80ded20df6446647a804a0b97fb0a9e5658becce

Dazzalytics commented 2 years ago

Thank you for the prompt feedback.

I have 0.1.1 version of the package and unfortunately, this issue is still there. I do not have package building experience, but I would like to help fix this issue (with some guidance). And possibly look at adding some more functionality to the package or work on ideas that you might already have.

robjhyndman commented 2 years ago

You will need to reinstall the package from github. Then try it. I have tested it on Windows and it worked for me on two computers.

Dazzalytics commented 2 years ago

Yes, it is working now. I appreciate your help!