vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.
https://vincentarelbundock.github.io/countrycode
GNU General Public License v3.0
346 stars 84 forks source link

cctld as origin code does not work as expected #244

Closed cjyetman closed 2 years ago

cjyetman commented 4 years ago

First, in the help file for codelist, it lists "ccTLD" as an origin and destination code, however it is named "cctld" in the actual codelist object.

Second, I haven't fully investigated this yet, but something is wrong with the matching on that code...

library(countrycode)
countrycode('.de', 'cctld', 'country.name')
#> Warning in countrycode(".de", "cctld", "country.name"): Some values were not matched unambiguously: .DE
#> [1] NA
any(codelist$cctld == '.de')
#> [1] TRUE

My first guess would be a toupper in countrycode since the error message says it couldn't match ".DE", but I suppose it could be related to the "." in the string, but that shouldn't matter because it shouldn't be running regex on it, right?

cjyetman commented 4 years ago

On line 63, we force the sourcevar vector to be in all caps to achieve case-insensitive matching, but that's with the assumption/insistence that all non-regex origin codes in codelist are in all caps, because we don't enforce that in the code for countrycode. All non-regex origin codes up until now have been in all caps, but the cctld origin code probably is more technically correct in lower case (though in most environments I've seen, host names are always converted to lowercase anyway).

One proposal would be to change line 201 matchidxs <- match(levels(sourcefctr), dict[[origin]]) to matchidxs <- match(levels(sourcefctr), toupper(dict[[origin]])) to ensure that the selected destination code is also converted to all caps when matching, though not modifying it otherwise so the return values remain in the same case as the original.

I've roughly tested this and it fixes this issue, but I haven't actually made the code change and run it through the testing system to see if it breaks anything else.

Eventually I may get around to making a PR, but feel free to implement this or a similar solution in the meantime if anyone else has the time.

vincentarelbundock commented 4 years ago

That makes sense to me.

It is currently modelsummary week, so I probably won't get around to this for a little while. But when I find the time I can give it a shot if you haven't already by that time.

vincentarelbundock commented 4 years ago

BTW, do we want a pkgdown website like the one I just built for modelsummary?

https://vincentarelbundock.github.io/modelsummary/

cjyetman commented 4 years ago

wrt pkgdown, I think that would be cool, and it's pretty easy to setup/manage

vincentarelbundock commented 2 years ago

Turns out doing just upper breaks some country name conversions, but we can work around this with a cctld-specific if{}else{} logic.

Thanks for the report and solution.