vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.
https://vincentarelbundock.github.io/countrycode
GNU General Public License v3.0
342 stars 84 forks source link

Faster conversion from all codes (not country names) #342

Closed etiennebacher closed 11 months ago

etiennebacher commented 11 months ago

Simple speedup: only apply toupper() on unique strings and then match the original vector on these. Not crucial but it saves a couple of seconds every time I use countrycode() in this direction. It uses a bit more memory though so whatever you prefer.

library(bench)

out <- cross::run(
  pkgs = c("vincentarelbundock/countrycode", "etiennebacher/countrycode@speedup-iso"),
  ~{
    library(countrycode)

    test <- data.frame(
      grp1 = sample(codelist$iso3c, 1e7, TRUE),
      grp2 = sample(codelist$cowc, 1e7, TRUE),
      grp3 = sample(codelist$eurostat, 1e7, TRUE)
    )

    bench::mark(
      countrycode(test$grp1, "iso3c", "country.name"),
      countrycode(test$grp2, "cowc", "country.name"),
      countrycode(test$grp3, "eurostat", "country.name"),
      iterations = 10,
      check = FALSE
    )
  }
)

tidyr::unnest(out, result) |>
  dplyr::select(pkg, expression, median, mem_alloc) |>
  dplyr::mutate(pkg = ifelse(grepl("vincent", pkg), "main", "fork")) |> 
  dplyr::arrange(expression, desc(pkg))
#> # A tibble: 6 × 4
#>   pkg   expression                                              median mem_alloc
#>   <chr> <bch:expr>                                            <bch:tm> <bch:byt>
#> 1 main  "countrycode(test$grp1, \"iso3c\", \"country.name\")"    3.34s  738.81MB
#> 2 fork  "countrycode(test$grp1, \"iso3c\", \"country.name\")" 991.98ms 1019.44MB
#> 3 main  "countrycode(test$grp2, \"cowc\", \"country.name\")"     3.03s  772.15MB
#> 4 fork  "countrycode(test$grp2, \"cowc\", \"country.name\")"  989.79ms    1.03GB
#> 5 main  "countrycode(test$grp3, \"eurostat\", \"country.name…    3.38s  739.88MB
#> 6 fork  "countrycode(test$grp3, \"eurostat\", \"country.name… 832.43ms 1020.47MB
vincentarelbundock commented 11 months ago

this is really great, thanks!