vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.
https://vincentarelbundock.github.io/countrycode
GNU General Public License v3.0
342 stars 84 forks source link

Antarctica has no continent #353

Closed geryan closed 6 months ago

geryan commented 6 months ago

Firstly - thanks to the devs for such a thoroughly useful package amid a clusterfrock of competing definitions.

The definition of continent seems unfit for a lot of purposes - strongly related to #288 - North and South America being together - although I can see the logic in backwards compatibility and going with the existing source, it creates some unexpected behaviours, namely a continent having an NA continent...

countrycode::countrycode(
  sourcevar = "Antarctica",
  origin = "country.name",
  destination = "iso3c"
)
#> [1] "ATA"

countrycode::countrycode(
  sourcevar = "Antarctica",
  origin = "country.name",
  destination = "continent"
)
#> Warning: Some values were not matched unambiguously: Antarctica
#> [1] NA

Would the devs be open to an additional continent definition, if I can find something consistent that would solve the Antarctica issue and #288?

I can see where continent is defined in dictionary/data_regions.csv, though I can't find on gh where that file is used to create codelist. If you can point me in the right direction, I'd be happy to try to implement.

cjyetman commented 6 months ago

I would strongly suggest using custom_match when you want to use/catch/convert names or codes that are not internally covered by countrycode

countrycode::countrycode(
  sourcevar = "Antarctica",
  origin = "country.name",
  destination = "continent",
  custom_match = c(Antarctica = "Antarctica")
)
#> [1] "Antarctica"
vincentarelbundock commented 6 months ago

Thanks for the report!

I agree with @cjyetman that, in general, we should be reluctant to add yet another regional variation, and that a better solution is typically to use the custom_match argument.

That said, our continent code is hosted in a plain CSV, and it does indeed seem to be missing an Antarctica row.

I updated it, and in version 1.5.0.9002 from Github you can now do:

library(countrycode)

countrycode("Antarctica", "country.name", "iso3c")
# [1] "ATA"

countrycode("Antarctica", "country.name", "continent")
# [1] "Antarctica"