robjhyndman / cricketdata

International cricket data for men and women, Tests, ODIs and T20s
http://pkg.robjhyndman.com/cricketdata/
79 stars 21 forks source link

Wrong number of innings and average #27

Closed myaseen208 closed 1 year ago

myaseen208 commented 1 year ago

The data fetched from https://www.espncricinfo.com/ through cricketdata and statistics given on https://www.espncricinfo.com/ fro Babar Azam don't match. For example, the number of test innings and averages don't match.

library(cricketdata)
library(tidyverse)
library(stringi)

df1 <- 
  fetch_player_data(
      playerid  = find_player_id(searchstring = "Babar Azam")$ID
    , matchtype = c("test", "odi", "t20")[1]
    , activity  = c("batting", "bowling", "fielding")[1]
    ) %>% 
  filter(Date !=  "2022-12-26")

df1 %>% 
  summarise(
    Mat   = length(unique(Date))
  , Inns  = length(Innings)
  , NO    = sum(stri_detect(str = Dismissal, regex = "not out|notout"), na.rm = TRUE)
  , Runs  = sum(Runs, na.rm = TRUE)
  , Ave   = sum(Runs, na.rm = TRUE)/(length(Innings) - sum(stri_detect(str = Dismissal, regex = "not out|notout"), na.rm = TRUE))
    )

# A tidytable: 1 × 5
    Mat  Inns    NO  Runs   Ave
  <int> <int> <int> <dbl> <dbl>
1    45    83     9  3470  46.9

However, https://www.espncricinfo.com/player/babar-azam-348144, gives 81 innings with 48.19 average.

robjhyndman commented 1 year ago

There were two innings where he did not bat. You need to remove them using filter(!is.na(Dismissal)). Then you get the same result:

library(cricketdata)
library(tidyverse)
library(stringi)

df1 <- fetch_player_data(
  playerid  = find_player_id(searchstring = "Babar Azam")$ID,
  matchtype = "test",
  activity  = "batting"
)

df1 %>%
  filter(
    !is.na(Dismissal),
    Date != "2022-12-26"
  ) |>
  summarise(
    Mat   = length(unique(Date)),
    Inns  = length(Innings),
    NO    = sum(stri_detect(str = Dismissal, regex = "not out|notout"), na.rm = TRUE),
    Runs  = sum(Runs, na.rm = TRUE),
    Ave   = sum(Runs, na.rm = TRUE) / (length(Innings) - sum(stri_detect(str = Dismissal, regex = "not out|notout"), na.rm = TRUE))
  )
#> # A tibble: 1 × 5
#>     Mat  Inns    NO  Runs   Ave
#>   <int> <int> <int> <dbl> <dbl>
#> 1    45    81     9  3470  48.2

Created on 2022-12-29 with reprex v2.0.2