mrc-ide / epireview
GNU General Public License v3.0
25 stars 2 forks source link

Is Chan and Nishiura serial interval duplicated in Ebola parameters? #86

Open joshwlambert opened 1 month ago

joshwlambert commented 1 month ago

Looking through the delay distribution entries in the Ebola data and there are entries with $article_label "Chan 2020 (1)" and "Chan 2020 (2)". However when comparing these rows/entries they are identical in every aspect other than the $parameter_data_id and $article_label. If so, is this an accidental duplication?

Reproducible example to show what I mean:

ebola_data <- epireview::load_epidata("ebola")
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning in load_epidata_raw(pathogen, "outbreak"): No data found for ebola
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning in epireview::load_epidata("ebola"): No outbreaks information found for
#> ebola
#> Data loaded for ebola
ebola_params <- ebola_data$params
chan_si_idx <- which(
  grepl(pattern = "Chan 2020 \\(", x = ebola_params$article_label) &
    grepl(pattern = "serial", x = ebola_params$parameter_type)
# what is the difference between Chan 2020 (1) and Chan 2020 (2) entries
ebola_params[chan_si_idx, ]
#> # A tibble: 2 × 77
#>   id      parameter_data_id covidence_id pathogen parameter_type parameter_value
#>   <chr>   <chr>                    <int> <chr>    <chr>                    <dbl>
#> 1 86e39e… 5c8d68c39d1c3b98…        15896 Ebola v… Human delay -…            15.3
#> 2 86e39e… e824649c690f81ba…        15896 Ebola v… Human delay -…            15.3
#> # ℹ 71 more variables: exponent <dbl>, parameter_unit <chr>,
#> #   parameter_lower_bound <dbl>, parameter_upper_bound <dbl>,
#> #   parameter_value_type <chr>, parameter_uncertainty_single_value <dbl>,
#> #   parameter_uncertainty_singe_type <chr>,
#> #   parameter_uncertainty_lower_value <dbl>,
#> #   parameter_uncertainty_upper_value <dbl>, parameter_uncertainty_type <chr>,
#> #   cfr_ifr_numerator <int>, cfr_ifr_denominator <int>, …
waldo::compare(ebola_params[chan_si_idx[1], ], ebola_params[chan_si_idx[2], ])
#> old vs new
#>                           parameter_data_id article_label
#> - old[1, ] 5c8d68c39d1c3b9870ecaaff0280d02e Chan 2020 (1)
#> + new[1, ] e824649c690f81ba50fae3d81254a9f2 Chan 2020 (2)
#> `old$parameter_data_id`: "5c8d68c39d1c3b9870ecaaff0280d02e"
#> `new$parameter_data_id`: "e824649c690f81ba50fae3d81254a9f2"
#> `old$article_label`: "Chan 2020 (1)"
#> `new$article_label`: "Chan 2020 (2)"

Created on 2024-06-14 with reprex v2.1.0