vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.
https://vincentarelbundock.github.io/countrycode
GNU General Public License v3.0
346 stars 84 forks source link

Accept NA values #302

Closed geotheory closed 2 years ago

geotheory commented 2 years ago

Is it not desirable to accept NA values as inputs and simply return NAs for them? I think this would help users to avoid having to create a wrapper function.

countrycode::countrycode(NA, origin = 'iso2c', destination = 'country.name')
Error in countrycode::countrycode(NA, origin = "iso2c", destination = "country.name") : 
  sourcevar must be a character or numeric vector. This error often
             arises when users pass a tibble (e.g., from dplyr) instead of a
             column vector from a data.frame (i.e., my_tbl[, 2] vs. my_df[, 2]
                                              vs. my_tbl[[2]])
geotheory commented 2 years ago

Obviously not to be confused with Namibia..

cjyetman commented 2 years ago

NA by itself in R reduces to a logical vector

class(NA)
#> [1] "logical"

If you use the more specific NA_character_ or include the NA in a character vector, it works as expected

countrycode::countrycode(NA_character_, origin = 'iso2c', destination = 'country.name')
#> [1] NA

countrycode::countrycode(c(NA, "US"), origin = 'iso2c', destination = 'country.name')
#> [1] NA              "United States"

I would argue that countrycode::countrycode(NA, origin = 'iso2c', destination = 'country.name') should be an error, because NA is not an expected vector to be passed as an argument.

geotheory commented 2 years ago

Ah R's NA quirk strikes again. I hear what you're saying. No strong view though I suspect the approach will mean some some users encounter rare failures e.g. when an automated workflow encounters a single NA input (maybe parsed from JSON) - which in practice can be harder to debug in production.

vincentarelbundock commented 2 years ago

Ah yeah, this is weird and annoying. I don't have an especially strong view either, but am inclined to defer to @cjyetman, as he always seems to fall on the "correct" side of these debates.

How about we just improve the error message to mention this corner case?

cjyetman commented 2 years ago

I have a strong feeling that it should be the responsibility of the web scraper (or whatever is generating the data) to return expected objects, e.g. if the web scraper is building a character vector because that's what it will usually find on the page, then it should deal with weird cases like no data or blanks or unfound elements etc. in an appropriate way, i.e. return a character vector with NA if necessary.

That being said, I'm struggling to think of an edge case where countrycode() auto returning NA if sourcevar is precisely NA (logical and only one element). It does then open the door though for questions like:

So, I don't see any harm in dealing with this very precise case in an automatic way, other than it possibly necessitates justification for why we don't deal with other similar but more complicated cases in an automatic way, which could then lead to creating and maintaining very sophisticated auto-fixing NA problems code, all to to deal with something that really just shouldn't be happening.

geotheory commented 2 years ago

Improved error message would be good - esp if it can be tailored to this failure case.

cjyetman commented 2 years ago

coercion in R is typically uni-directional, e.g. from "higher" classes to "lower" classes, but not the other way around

vincentarelbundock commented 2 years ago

fe1c4a6c0a