vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.
https://vincentarelbundock.github.io/countrycode
GNU General Public License v3.0
346 stars 84 forks source link

Naming the new crosswalk factory #282

Closed vincentarelbundock closed 3 years ago

vincentarelbundock commented 3 years ago

Hi @cjyetman and @NilsEnevoldsen

The next release of countrycode will have two new features of note:

First, the destination argument accepts a vector of arguments. For instance, “SRB” is a valid ISO code but not a valid CoW code, so we get:

library(countrycode)

x <- c("Algeria", "Serbia")

countrycode(x, "country.name", "cowc")
#> Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Serbia
#> [1] "ALG" NA

countrycode(x, "country.name", "iso3c")
#> [1] "DZA" "SRB"

countrycode(x, "country.name", c("cowc", "iso3c"))
#> Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Serbia
#> [1] "ALG" "SRB"

Second, there is a new “function factory” which can create arbitrary crosswalks or change the defaults of countrycode.

How should this function be called?

Minimal example:

name_to_iso3c <- countrycode_factory(origin = "country.name", destination = "iso3c")
name_to_iso3c(c("Algeria", "Canada"))
#> [1] "DZA" "CAN"

State code example:

# Download dictionary
state_dict <- "https://raw.githubusercontent.com/vincentarelbundock/countrycode/main/data/custom_dictionaries/us_states.csv"
state_dict <- read.csv(state_dict, stringsAsFactors = FALSE)

# Identify regular expression origin codes
attr(state_dict, "origin_regex") <- "state.regex"

# Set default values for the custom conversion function
statecode <- countrycode_factory(
  origin = "state.regex",
  destination = "abbreviation",
  custom_dict = state_dict
)

# Voilà!
statecode(c("Alabama", "New Mexico"), "state.regex", "abbreviation")
#> [1] "AL" "NM"

statecode(c("AL", "NM", "VT"), "abbreviation", "state")
#> [1] "Alabama"    "New Mexico" "Vermont"
cjyetman commented 3 years ago

what's the logic with the destination argument? It finds matches within the first destination code, if NA in the first then use the 2nd, and so forth?

vincentarelbundock commented 3 years ago

what's the logic with the destination argument? It finds matches within the first destination code, if NA in the first then use the 2nd, and so forth?

Exactly. So, for example, if nomatch="BlahBlah", then using a vector as destination will make no difference between there will be no missing values on second pass.

NilsEnevoldsen commented 3 years ago

factory and crosswalk seem too generic, so for that reason countrycode_factory seems the better of the three to me, but I am not expert enough in R to be confident Naming Things.

vincentarelbundock commented 3 years ago

factory and crosswalk seem too generic, so for that reason countrycode_factory seems the better of the three to me, but I am not expert enough in R to be confident Naming Things.

yeah, that's probably right. Wouldn't want namespace conflicts with other packages, which might happen with such generic names. My main hesitation is that it would be nice to signal that countrycode is no longer just about countries. I can do arbitrary crosswalks. But I guess I made my bed when calling the package countrycode.

vincentarelbundock commented 3 years ago

Alright, since there doesn't seem to be an objection to countrycode_factory, we'll go with that.

Thanks both!

vincentarelbundock commented 3 years ago

The other thread is making me rethink this altogether. It's so easy to just wrap countrycode in a function with different defaults that I should probably just scrap this factory idea. Do you agree @cjyetman ?

cjyetman commented 3 years ago

🤷🏻 wasn't it in response to a specific request/goal? it does have the unique feature of using multiple destination codes in succession, right?

I do think it's probably unnecessary. Personally, I would not use this. I would rather specify arguments explicitly than create a custom function and use that... to me it's easier to read and understand.

NilsEnevoldsen commented 3 years ago

I do think it's probably unnecessary. Personally, I would not use this. I would rather specify arguments explicitly than create a custom function and use that... to me it's easier to read and understand.

Ditto.

I suggest adding a section to the documentation for wrapping countrycode with a custom dictionary and/or set defaults.

vincentarelbundock commented 3 years ago

Thanks both for the thoughtful responses.

I removed the countrycode_factory function

I kept in the possibility of using a vector for destination, which allows fallback codes.

I included an example function in the README

I uploaded v1.3.0 to CRAN.

Cheers!