Create magic function - Githubissues

vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.

https://vincentarelbundock.github.io/countrycode

GNU General Public License v3.0

343 stars 84 forks source link

Create magic function #239

Closed davidsjoberg closed 4 years ago

davidsjoberg commented 4 years ago

In my immature package simplecountries I wanted to build a magic function that takes any character vector of country names and translates it to common english.

Reasons for a new package

Firstly, because of simplifying joins between datasets from different sources with different names for the same country. I have used countrycode many times before but thought that I needed to know which definition of the country variation that the sources was using which was not possible or time consuming at times. But since
Secondly, I wanted to add alternative names of countries that are used, even though they have no formal definition like ISO. I scraped them from [Wikipedia (https://en.wikipedia.org/wiki/List_of_alternative_country_names).
Thirdly, I wanted to have a function that accepted non-consistent vectors, i.e. mixing country name definitions. Like UK and Sweden in the same vector.

To do this I scraped the wikipedia page of alternative country names. Translated the Wikipedia common names to countrycode's country.name.en column (maybe there is a better one?). Then I made a huge lookuptable and deleted non-unique combinations.

This created a magic (wildcard) function that accepts any variation of country names and translates it to a simple country name which can be used on two datasets to be able to make a join or to simplify country names in a plot.

The code for the lookup table that is used in simple_country_name can be found here.

If you want to include, or already have that I missed, a magic function it would be great. It is much better suited in the already well written and mature countrycode.

My suggestion for API would be a variation of countrycode::countrycode that does NOT require origin nor destination. But destination could absolutely be an argument for those who want it. But there should be a decent default, like country.name.en. Also, it would be nice to include Wikipedia's alternative country names.

That's all. Thanks for a great package!

Alt Text

vincentarelbundock commented 4 years ago

Very nice. Here's one way this could potentially be integrated into the countrycode.

Maintain a named vector of known and unambiguous country name variants. This vector would be created based on the Wikipedia that you scraped, but also using the alternatives we already have identified in our test suite. We would ship this vector with countrycode.
Create a function called countryname with default arguments origin='country.name.en.regex' and destination='cldr.short.en'.
countryname first tries to convert using the regexes, and then fills-in missing values using the list of known alternatives. This could be a two-step process, or even a single call to countrycode using the custom_match argument with our new named vector of variants.
If destination is not cldr.short.en, then we first convert to "clean" country names that we know countrycode can handle (again using the list of known alternatives), and then we call countrycode a second time to convert to iso3c or whatever.

This could be quite nice, because it would reduce the need for regexes to catch every single weird known variants. It would also require almost no new coding effort, since all the building blocks are here already.

vincentarelbundock commented 4 years ago

Not sure if @davidsjoberg gets notifications unless I ping them.

vincentarelbundock commented 4 years ago

OK, so here's a proof of concept. I've considerably expanded the idea by including all the CLDR country names, which means that we can now translate country names from any language (as long as they are "official" well-formed names in the respective languages). I stored the dictionary build code and the function in a Gist, which I load using source.

https://gist.github.com/vincentarelbundock/2e00c19c1972e73d708c7cb496bd00d2

url <- 'https://gist.githubusercontent.com/vincentarelbundock/2e00c19c1972e73d708c7cb496bd00d2/raw/15b9cabe96511521a1b40bb9b6d6a0c97662eebe/countryname.R'
source(url)
weird <- c('ジンバブエ', 'Afeganistãu',  'Barbadas', 'Sverige', 'UK',  'il-Georgia tan-Nofsinhar u l-Gżejjer Sandwich tan-Nofsinhar')
countryname(weird)
#> [1] "Zimbabwe"                              
#> [2] "Afghanistan"                           
#> [3] "Barbados"                              
#> [4] "Sweden"                                
#> [5] "UK"                                    
#> [6] "South Georgia & South Sandwich Islands"

This is rough, but what do you think? (also curious about @cjyetman and @NilsEnevoldsen opinions on this.)

NilsEnevoldsen commented 4 years ago

Mixed feelings. IMHO these alternative names fall into two categories: English names and Non-English names.

English names should just be added to country.name.en.regex. (Indeed many already have been.)

Non-English names I'm unsure about.

cjyetman commented 4 years ago

Quick thoughts before I fall asleep...

This List of alternative country names is a cool resource. We should use this to build out our known name variations in our test suite
Personally, I think it would be super cool if our regexes matched country names in any/all languages, though I'm uncertain how much of a performance concern that would be. I can't imagine a scenario where a user would be upset that countrycode gave a valid result for a country name that was non-English even though they didn't explicitly ask to match non-English variations.
Ideally, I would think this should work without any multi-stage process. It should just work, as countrycode always has.
To integrate these variations (language or otherwise) into our existing regexes, I see two paths. 1) convert them all into regexes as we've done in the past, but would probably be a big undertaking, 2) add them as alterations as I've suggested before here #197, which would likely be a lot easier, but ultimately somewhat less flexible in terms of catching unknown variations of these variations.

cjyetman commented 4 years ago

and one quick caution... that Wikipedia page include a lot of things, some labeled as "ambiguous", some as "former" names, some as "initialism", etc... I'm not sure how to easily assess all of them and whether or not they should or should not be included as valid variations, especially the ones in non-Latin scripts, which I can't possibly judge their validity.

vincentarelbundock commented 4 years ago

All good points.

I agree that to the extent possible, known English variations should be caught by our regexes. We should also exploit that cool wikipedia list to build up our test suite. Those two things should be priority number 1.

That said, the possibility of converting names in many languages seems super cool to me. I was playing with some Eurobarometer data where countries are encoded in local languages, and the crappy function above did a nice job of converting most of them. That's quite nice to have out of the box.

I do agree with the caution raised above w.r.t. the wikipedia list. For instance, it's probably not desirable to automatically convert "Rhodesia" to "Zimbabwe" without any warning. I also empathize with the concern about our inability to check other languages ourselves. In that respect, the CLDR names seem less problematic, since presumably there is some quality assurance there.

With respect to proposal https://github.com/vincentarelbundock/countrycode/issues/197, I think we should probably go ahead for the most difficult cases. We know the countries which have been pain points since the start. It's probably better to be a bit conservative and more explicit with those.

FWIW I did a quick benchmark of the propose "US Virgin Islands" regex against the one we currently use. The longer more explicit version was about 5 times faster on a large dataset.

vincentarelbundock commented 4 years ago

But maybe we don't have to go overboard and banish dots entirely, so that:

simplifies to:

"United States Virgin Islands|U.S. Virgin Islands|Virgin Islands. U.S."

davidsjoberg commented 4 years ago

Nice discussion!

I actually included all non-english names but didn't include them in my example since reprex can't handle weird characters. I think non-english names is a nice feature, but should probably but dominated by english variations, i.e. if ambigious then the english version should be used. If it is unique it doesn't hurt to add that feature.

I also think the ambigious alternative names should be exluded. Maybe even exclude all variations that you do not understand like @cjyetman raised concern of.

But I want to raise a few points. First, countrycode actually was more powerful than I knew. For example:

> countrycode::countrycode("gret britain", 'country.name', 'iso3c')
[1] "GBR"
> countrycode::countrycode("Uk.", 'country.name', 'iso3c')
[1] "GBR"
>

It correctly catched two weirdish country names. And actually solved my original problem. However, I think it was hard to understand this behaviour from the documentation. For example, country.name is not a column in countrycode::codelist. A suggestion would be to have country.name as default in the countrycode function. That would help the user (like me) who just tries to join data frames with country names and doesn't know anything about iso standards etc. Seems like a too steep a learning curve for the issue. The wise who know what they have and what they want should be able to specify it.

Another issue is countrycode speed. It's probably the regex which is very comutationally heavy. For example. I made a micro benchmark with countrycode::countrycode and simplecountries::simple_country_name:

x <- countrycode::codelist$country.name.en

microbenchmark(
  countrycode::countrycode(x, 'country.name', 'country.name'),
  countrycode::countrycode(x, 'country.name.en', 'country.name.en'),
  simple_country_name(x), 
  times = 10
)

My function is about 2000 (!) times faster. Even when the input, origin and desination is the same. I think a simple lookup table should be used and regex should only be used when absolutely necessary. I could be wrong here since I haven't read your source code.

To conclude, I think it would be awesome if countrycode had a "guess" as defualt much like country.name in the example above. Also, I think you should use a banal lookuptable in the backend and only turn to regex for non-matches to greatly improve efficiency. Also, add som of the alternative country names from Wikipedia that are used but might not be in any formal country names definitions.

cjyetman commented 4 years ago

Generally, I like the idea of having origin = "country.name" as a default, but I think we have had that discussion before and decided against it. One unfortunate result of doing that would be that users would eventually want/expect to be able to do something like countrycode(df$country, "iso3c"), but this would not work unless they explicitly named the destination parameter (i.e. countrycode(df$country, destination = "iso3c")) because of the existing order of the parameters. The only way around that would be to reorder the parameters, which would easily break lots of old code, so it's a non-starter.

davidsjoberg commented 4 years ago

Yes. I see your point. Probably is better to have a new function then that removes the origin paramter so that you can do somthe like country_name(df$country, "iso3c") like you said.

cjyetman commented 4 years ago

and the background for why country.name is not a column in countrycode::codelist...

There are currently two sets of regexes, one for English (country.name.en.regex) and one for German (country.name.de.regex). country.name is sort of an "alias" for country.name.en.regex to maintain backward compatibility. This infrastructure allows us to add regex sets for other languages etc. In fact, the use case being discussed here could become something like a column added to countrycode::codelist named country.name.alllanguages.regex which would contain regexes that match country names in as many languages as possible, and then the user would still have the normal behavior, but they could use something like this to match country names in any language: countrycode(df$country, origin = "country.name.alllanguages.regex", destination = "iso3c")

vincentarelbundock commented 4 years ago

Cool.

I'm not very enthusiastic about the prospect of changing default values of arguments or their order. But I think I see a consensus emerging:

New function called countryname whose goal is to magically convert country names from any language to any code.
Try to use a lookup table first (and maybe regex on the non-matching cases) to improve performance.
Use CLDR codes only, because we are pretty sure that those are "correct".

Does that work?

davidsjoberg commented 4 years ago

Yes. That sounds very nice! Maybe obvious, but countryname has a default destination to country.name?

Why limit to CLDR? You know more about the different origins/destination but the more the merrier, given it unambigious?

vincentarelbundock commented 4 years ago

Yes, "country.name" would be the default destination.

Oh sorry, all the other names we have in codelist would make it: ISO, CoW, UN, etc. I thought we had come to agreement that wikipedia should be excluded because it includes many problematic cases, and because we don't have the language skills to distinguish.

Also, even if "Republic of Albania" (in Wiki but not codelist) is missing from the lookup table, our regexes will catch it.

davidsjoberg commented 4 years ago

Ok. sounds good. But the more you can add to the lookup table the faster the function will be.

But it makes sense not to include all Wikipedia alternatives since many of them are weird. I haden't heard all of the alternative names of Sweden...

cjyetman commented 4 years ago

I made an alternate version of countryname that does exact matching first using custom_dict and then runs a second pass through the default regex on any unmatched values. @davidsjoberg is correct that doing it that way has a significant speed advantage (not that I doubted that). Might have to do a third pass though, unfortunately, to enable a selectable destination code since the custom dictionary only has one destination code.

library(countrycode)
library(microbenchmark)

source('https://gist.githubusercontent.com/vincentarelbundock/2e00c19c1972e73d708c7cb496bd00d2/raw/15b9cabe96511521a1b40bb9b6d6a0c97662eebe/countryname.R')

countryname_cust_match <- 
  function(sourcevar, 
           origin = 'country.name.en', 
           destination = 'cldr.short.en') {

    out <- 
      countrycode(sourcevar = sourcevar,
                  origin = origin,
                  destination = destination,
                  custom_match = alternatives,
                  warn = FALSE)

    return(out)
  }

alternatives_cust_dict <- 
  tibble(country.name = names(alternatives), 
         cldr.short.en = alternatives) %>% 
  filter(!duplicated(country.name))  # "لتونی"matches both Latvia and Lithuania!?!

countryname_cust_dict <- 
  function(sourcevar) {

    out <- 
      countrycode(sourcevar = sourcevar,
                  origin = 'country.name',
                  destination = 'cldr.short.en',
                  custom_dict = alternatives_cust_dict,
                  warn = FALSE)

    out[is.na(out)] <- 
      countrycode(sourcevar = sourcevar[is.na(out)], 
                  origin = 'country.name.en', 
                  destination = 'cldr.short.en', 
                  warn = FALSE)

    return(out)
  }

weird <- c('ジンバブエ', 'Afeganistãu',  'Barbadas', 'Sverige', 'UK',  'il-Georgia tan-Nofsinhar u l-Gżejjer Sandwich tan-Nofsinhar')
countryname_cust_match(weird)
#> [1] "Zimbabwe"                              
#> [2] "Afghanistan"                           
#> [3] "Barbados"                              
#> [4] "Sweden"                                
#> [5] "UK"                                    
#> [6] "South Georgia & South Sandwich Islands"
countryname_cust_dict(weird)
#> [1] "Zimbabwe"                              
#> [2] "Afghanistan"                           
#> [3] "Barbados"                              
#> [4] "Sweden"                                
#> [5] "UK"                                    
#> [6] "South Georgia & South Sandwich Islands"

x <- countrycode::codelist$country.name.en

microbenchmark(
  countryname_cust_match(x),
  countryname_cust_dict(x),
  times = 20
)
#> Unit: milliseconds
#>                       expr       min        lq      mean    median        uq
#>  countryname_cust_match(x) 888.86483 902.83125 920.65200 908.82574 923.27087
#>   countryname_cust_dict(x)  70.64385  73.73317  76.73232  77.20547  79.36352
#>         max neval
#>  1040.08927    20
#>    85.15852    20

identical(countryname_cust_match(x), countryname_cust_dict(x))
#>  [1] TRUE

cjyetman commented 4 years ago

Actually, there's a "bug" in @vincentarelbundock's version of countryname... because the custom_match overrides every valid regex match with a value that doesn't fit the non-default destination code...

source('https://gist.githubusercontent.com/vincentarelbundock/2e00c19c1972e73d708c7cb496bd00d2/raw/15b9cabe96511521a1b40bb9b6d6a0c97662eebe/countryname.R')
weird <- c('ジンバブエ', 'Afeganistãu',  'Barbadas', 'Sverige', 'UK',  'il-Georgia tan-Nofsinhar u l-Gżejjer Sandwich tan-Nofsinhar')
countryname(weird)
#> [1] "Zimbabwe"                              
#> [2] "Afghanistan"                           
#> [3] "Barbados"                              
#> [4] "Sweden"                                
#> [5] "UK"                                    
#> [6] "South Georgia & South Sandwich Islands"
countryname(weird, destination = 'iso3c')
#> [1] "Zimbabwe"                              
#> [2] "Afghanistan"                           
#> [3] "Barbados"                              
#> [4] "Sweden"                                
#> [5] "GBR"                                   
#> [6] "South Georgia & South Sandwich Islands"

I would suggest something like this... the third pass through countrycode only occurs if a non-default destination code is chosen, and it doesn't have much impact on the speed because it also forces an exact matching...

source('https://gist.githubusercontent.com/vincentarelbundock/2e00c19c1972e73d708c7cb496bd00d2/raw/15b9cabe96511521a1b40bb9b6d6a0c97662eebe/countryname.R')

alternatives_cust_dict <- 
  tibble(country.name = names(alternatives), 
         cldr.short.en = alternatives) %>% 
  filter(!duplicated(country.name))  # "لتونی"matches both Latvia and Lithuania!?!

countryname <- 
  function(sourcevar, destination = 'cldr.short.en', warn = FALSE) {

    out <- 
      countrycode(sourcevar = sourcevar,
                  origin = 'country.name',
                  destination = 'cldr.short.en',
                  custom_dict = alternatives_cust_dict,
                  warn = FALSE)

    out[is.na(out)] <- 
      countrycode(sourcevar = sourcevar[is.na(out)], 
                  origin = 'country.name.en', 
                  destination = 'cldr.short.en', 
                  warn = warn)

    if (destination != 'cldr.short.en') {
      out <-
        countrycode(sourcevar = out,
                    origin = 'cldr.short.en', 
                    destination = destination,
                    custom_dict = countrycode::codelist, 
                    warn = warn)
    }

    return(out)
  }

weird <- c('ジンバブエ', 'Afeganistãu',  'Barbadas', 'Sverige', 'UK',  'il-Georgia tan-Nofsinhar u l-Gżejjer Sandwich tan-Nofsinhar')
countryname(weird)
#> [1] "Zimbabwe"                              
#> [2] "Afghanistan"                           
#> [3] "Barbados"                              
#> [4] "Sweden"                                
#> [5] "UK"                                    
#> [6] "South Georgia & South Sandwich Islands"
countryname(weird, destination = 'iso3c')
#> [1] "ZWE" "AFG" "BRB" "SWE" "GBR" "SGS"

library(microbenchmark)

x <- countrycode::codelist$country.name.en

microbenchmark(
  countryname(x),
  countryname(x, destination = 'iso3c'),
  times = 20
)
#> Unit: milliseconds
#>                                   expr      min       lq     mean   median
#>                         countryname(x) 70.35332 71.76669 75.99029 75.63757
#>  countryname(x, destination = "iso3c") 70.05191 74.09016 77.45305 77.18706
#>        uq      max neval
#>  77.95850 90.27649    20
#>  80.20842 86.50942    20

head(countryname(x))
#> [1] "Afghanistan"    "Åland Islands"  "Albania"        "Algeria"       
#> [5] "American Samoa" "Andorra"
head(countryname(x, destination = 'iso3c'))
#> [1] "AFG" "ALA" "ALB" "DZA" "ASM" "AND"

vincentarelbundock commented 4 years ago

Does that also work for factor or tibble sourcevar?

On Thu, May 21, 2020, at 05:48, CJ Yetman wrote:

Actually, there's a "bug" in @vincentarelbundock https://github.com/vincentarelbundock's version of countryname... because the custom_match overrides every valid regex match with a value that doesn't fit the non-default destination code...

source('https://gist.githubusercontent.com/vincentarelbundock/2e00c19c1972e73d708c7cb496bd00d2/raw/15b9cabe96511521a1b40bb9b6d6a0c97662eebe/countryname.R') weird <- c('ジンバブエ', 'Afeganistãu', 'Barbadas', 'Sverige', 'UK',
'il-Georgia tan-Nofsinhar u l-Gżejjer Sandwich tan-Nofsinhar') countryname(weird)

> [1] "Zimbabwe"

> [2] "Afghanistan"

> [3] "Barbados"

> [4] "Sweden"

> [5] "UK"

> [6] "South Georgia & South Sandwich Islands"

countryname(weird, destination = 'iso3c')

> [1] "Zimbabwe"

> [2] "Afghanistan"

> [3] "Barbados"

> [4] "Sweden"

> [5] "GBR"

> [6] "South Georgia & South Sandwich Islands"

I would suggest something like this... the third pass through countrycode only occurs if a non-default destination code is chosen, and it doesn't have much impact on the speed because it also forces an exact matching...

source('https://gist.githubusercontent.com/vincentarelbundock/2e00c19c1972e73d708c7cb496bd00d2/raw/15b9cabe96511521a1b40bb9b6d6a0c97662eebe/countryname.R')

alternatives_cust_dict <- tibble(country.name = names(alternatives), cldr.short.en = alternatives) %>% filter(!duplicated(country.name)) # "لتونی"matches both Latvia and Lithuania!?!

countryname <- function(sourcevar, destination = 'cldr.short.en', warn = FALSE) {
out <- 
  countrycode(sourcevar = sourcevar,
              origin = 'country.name',
              destination = 'cldr.short.en',
              custom_dict = alternatives_cust_dict,
              warn = FALSE)

out[is.na(out)] <- 
  countrycode(sourcevar = sourcevar[is.na(out)], 
              origin = 'country.name.en', 
              destination = 'cldr.short.en', 
              warn = warn)

if (destination != 'cldr.short.en') {
  out <-
    countrycode(sourcevar = out,
                origin = 'cldr.short.en', 
                destination = destination,
                custom_dict = countrycode::codelist, 
                warn = warn)
}

return(out)
}

weird <- c('ジンバブエ', 'Afeganistãu', 'Barbadas', 'Sverige', 'UK',
'il-Georgia tan-Nofsinhar u l-Gżejjer Sandwich tan-Nofsinhar') countryname(weird)

> [1] "Zimbabwe"

> [2] "Afghanistan"

> [3] "Barbados"

> [4] "Sweden"

> [5] "UK"

> [6] "South Georgia & South Sandwich Islands"

countryname(weird, destination = 'iso3c')

> [1] "ZWE" "AFG" "BRB" "SWE" "GBR" "SGS"

library(microbenchmark)

x <- countrycode::codelist$country.name.en

microbenchmark( countryname(x), countryname(x, destination = 'iso3c'), times = 20 )

> Unit: milliseconds

> expr min lq mean median

> countryname(x) 70.35332 71.76669 75.99029 75.63757

> countryname(x, destination = "iso3c") 70.05191 74.09016 77.45305 77.18706

> uq max neval

> 77.95850 90.27649 20

> 80.20842 86.50942 20

head(countryname(x))

> [1] "Afghanistan" "Åland Islands" "Albania" "Algeria"

> [5] "American Samoa" "Andorra"

head(countryname(x, destination = 'iso3c'))

> [1] "AFG" "ALA" "ALB" "DZA" "ASM" "AND"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vincentarelbundock/countrycode/issues/239#issuecomment-631993605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHQ7MONGUQ5ONMA2DNISDDRST2HJANCNFSM4NEGPEXA.

-- Vincent Arel-Bundock

Professeur agrégé / Associate professor http://arelbundock.com Université de Montréal, Science politique 3150 rue Jean-Brillant, Pav. Lionel-Groulx, C-4020 Montréal, Québec, Canada, H3T 1N8

cjyetman commented 4 years ago

It should work exactly how countrycode works... factors as sourcevar work fine, if a tibble is subset it will fail with error message, if a tibble column is properly extracted it will work fine.

source('https://gist.githubusercontent.com/vincentarelbundock/2e00c19c1972e73d708c7cb496bd00d2/raw/15b9cabe96511521a1b40bb9b6d6a0c97662eebe/countryname.R')

alternatives_cust_dict <- 
  tibble(country.name = names(alternatives), 
         cldr.short.en = alternatives) %>% 
  filter(!duplicated(country.name))  # "لتونی"matches both Latvia and Lithuania!?!

countryname <- 
  function(sourcevar, destination = 'cldr.short.en', warn = FALSE) {

    out <- 
      countrycode(sourcevar = sourcevar,
                  origin = 'country.name',
                  destination = 'cldr.short.en',
                  custom_dict = alternatives_cust_dict,
                  warn = FALSE)

    out[is.na(out)] <- 
      countrycode(sourcevar = sourcevar[is.na(out)], 
                  origin = 'country.name.en', 
                  destination = 'cldr.short.en', 
                  warn = warn)

    if (destination != 'cldr.short.en') {
      out <-
        countrycode(sourcevar = out,
                    origin = 'cldr.short.en', 
                    destination = destination,
                    custom_dict = countrycode::codelist, 
                    warn = warn)
    }

    return(out)
  }

weird <- c('ジンバブエ', 'Afeganistãu',  'Barbadas', 'Sverige', 'UK',  'il-Georgia tan-Nofsinhar u l-Gżejjer Sandwich tan-Nofsinhar')

weird_fctr <- as.factor(weird)
countryname(weird_fctr)
# works
countryname(weird_fctr, destination = 'iso3c')
# works

weird_tbl <- tibble(weird)
countryname(weird_tbl)
#fails
countryname(weird_tbl[1])
#fails
countryname(weird_tbl[, 1])
#fails
countryname(weird_tbl[[1]])
# works
countryname(weird_tbl$weird)
# works

If you wanted it to work with subset tibbles that have only 1 column, you could add this to the top of the function (but I thought you decided against doing any manipulation like this)

if (inherits(sourcevar, "tbl_df") & length(sourcevar) == 1) {
  message("You passed a tibble with 1 column. You probably meant to *extract* 
          the column rather than subset the tibble. Since it's pretty easy to 
          guess what you want since there's only one column, we're gonna do 
          that for you automatically. If you don't want to see this message anymore
          *extract* the column you want to use as a sourcevar before passing it to 
          countrycode (i.e. it must be a vector, not a one column tibble)")
  sourcevar <- sourcevar[[1]]
}

vincentarelbundock commented 4 years ago

The countryname function is now implemented in 749843e

FYI, @cjyetman I had to use country.name.en internally, because cldr.short.en does not cover all names (e.g., cow.name). There might be a performance hit that we want to investigate.

Thanks @davidsjoberg for the great suggestion, and thanks all for your insights!

Please let me know if it works on your end so I can close the issue.

vincentarelbundock commented 4 years ago

This should work:

remotes::install_github('vincentarelbundock/countrycode')
library(countrycode)
weird <- c('ジンバブエ', 'Afeganistãu',  'Barbadas', 'Sverige', 'UK',  'il-Georgia tan-Nofsinhar u l-Gżejjer Sandwich tan-Nofsinhar')
countryname(weird)
countryname(weird, 'iso3c')

davidsjoberg commented 4 years ago

Works great! Thank you for fast development and responsiveness!

Well done! I'll update my repo readme of simplecountry to use countrycode::countryname instead :) You may close the issue. When do you think it will be on CRAN?

vincentarelbundock commented 4 years ago

Cool. Not 100% sure. Hopefully next week.