Closed luispfonseca closed 3 years ago
For the countryname
case:
Sint Maarten
= Sint Maarten
. Correct.Saint-Martin
= Sint Maarten
. Unsure if this is correct. In what language is this a country name? It's unfortunate if it is a country name in some language, because AFAIK Saint-Martin is the name of an Island, not the name of a country.Saint Martin
= NA. I think this is correct; I don't think there's a language in which Saint Martin
is a country name.Saint Martin FR
= NA. I think this is correct; I don't think there's a language in which Saint Martin FR
is a country name.For the countrycode
case:
Sint Maarten
= Sint Maarten
. Correct.Saint-Martin
= NA. Correct. Saint-Martin
is the name of an Island, not the name of a country.Saint Martin
= NA. Correct. Saint Martin
is ambiguous.Saint Martin FR
= Saint Martin (French part)
. Correct.Thanks @NilsEnevoldsen for the clarification. Super useful, as always!
Thank you for such a quick response!
I think one part of the issue is that I may have misunderstood the countryname
function. In any case, it does seem inconsistent to me that "Saint-Martin" would be matched with "Sint Maarten" but "Saint Martin" would be unmatched.
I am not well versed in both geography and the criteria of countrycode
, so to ensure I do not misunderstand the issue, please allow me to ask:
It appears in any case that this is desired behavior, so I think the issue can be closed. For now, I am just asking so I have a better understanding of the package, as I use it regularly. Feel free to direct me to documentation I may have missed.
Thank you for your work in this great package.
I'm not sure it's possible to draw clean lines between the multiple forms of sovereign and quasi-sovereign entities out there. There are dozens of ongoing territorial disputes, and the UN can't even seem to resolve them! Who are we to think we can adjudicate? At some point, we have to concede that any conversion scheme in countrycode
will be highly imperfect and sometimes inconsistent. So while your first bullet is definitely a consideration, I don't think it is necessarily dispositive.
From my perspective (not speaking for others), the more important issue is your second bullet: ambiguity. If "Saint Martin" could refer to the whole island - including both Sint Maarten and Saint Martin (French Part) - but also refer to just the French part, then ambiguity arise. In those cases, I think it is "safer" for countrycode
to return NA
and to issue an explicit warning to that effect.
Users can easily use the nomatch
or the custom_match
arguments to fill-in the missing value. It's a small additional burden on the user, but at least we guard against a potential problem.
Makes sense to me. The only thing then I'd think could still warrant a potential change is the inconsistency in output between "Saint-Martin" and "Saint Martin" for the countryname
function.
library(countrycode)
packageVersion("countrycode")
#> [1] '1.2.0'
countryname(c("Sint Maarten", "Saint-Martin", "Saint Martin"))
#> [1] "Sint Maarten" "Sint Maarten" NA
Thank you for taking the time, once again. I'll close this.
Good catch!
The issue with countryname
is that we use a massive set of name variations to do automagic conversion. This usually gives good results, but can sometimes produce ambiguous ones, like in this case. It is not realistic to audit this massive set of variations manually, so countryname
will always remain inherently more “dangerous” than countrycode
. For instance:
library(countrycode)
countryname(c("Sint Maarten", "Saint-Martin", "Saint Martin"))
#> Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Saint Martin
#> [1] "Sint Maarten" "Sint Maarten" NA
countrycode(c("Sint Maarten", "Saint-Martin", "Saint Martin"),
origin = "country.name",
destination = "country.name")
#> Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Saint-Martin, Saint Martin
#> [1] "Sint Maarten" NA NA
I added a warning about this in the countryname
documentation:
Thank you for your work for this package.
I found this issue. I don't have time to fix at the moment, but this is something I could look into at a later time if you tell me what needs to be fixed.
Created on 2021-06-22 by the reprex package (v2.0.0)