Closed Olucik closed 2 years ago
Thanks for the report, but I'm not sure about some of these.
I'm not completely closed on any of these ideas, but an actual argument should be made before effort is expended.
Also, you might want to look into the custom_match
argument.
thank you "Korea, Democratic People's Republic of" works, but "Korea, Democratic Republic of" does not work with the current dev version
I see. Is that a common form? And how would you modify the regex?
dprk|d.p.r.k|korea.+(d.p.r|dpr|north|dem.*peo.*rep.*)|(d.p.r|dpr|north|dem.*peo.*rep.*).+korea
it's how Social Progress Index names it (for all the years). I haven't seen it anywhere else yet. May be it still falls under "custom" category
good to know. I just tried googling the expression in quotes, and can't really find other instances. I think I'll leave this issue open for future consideration, but not do anything for now.
Sorry if that seems unresponsive, I'm just not 100% convinced by the use-case, and I'm super busy at work these days (and trying to minimize non-essential tasks).
thank you!
Just hit this one, "Central African Rep.", in CEPII IPD 2012...
> countrycode("Central African Rep.", "country.name", "country.name")
[1] NA
Warning message:
In countrycode("Central African Rep.", "country.name", "country.name") :
Some values were not matched unambiguously: Central African Rep.
What's your view on abbreviations? Should those be countrycode's responsibility? I suppose we could make it countrycode's responsibility, but slippery slope, etc.
I think we're already on that slope since we currently do: "Korea, Rep. of", "Rep. of Korea", "U.S.A.", "D.P.R. Korea", "D.R. Congo", "U.S. Virgin Islands", etc. Since there's a finite number of country names, and an even smaller finite list of words/names within that which can be reasonably abbreviated, that slope may be slippery, but not so long.
And since the regex codes are really the core value of the package, than maybe they should be as adaptable as possible? Of course, what does/should trump all of that is who/when/how does it get done, does it negatively affect anything else, etc., which are completely valid concerns also.
I'm convinced.
changed the title because this thread eventually led to an agreement that some abbreviations should be considered for addition to the regexes
Thanks again for opening this issue. If someone has very specific suggestions for changes to the regular expressions, I encourage them to create a Pull Request by modifying the dictionary/data_regex.csv
file.
For now, we'll close this issue to tidy up the repo.
countrycode("Centr. African Rep.", "country.name", "country.name") should result in "Central African Republic" countrycode("Dominic. Republic.", "country.name", "country.name") should result in "Dominican Republic" countrycode("Kuweit", "country.name", "country.name") should result in "Kuwait" countrycode("Timor", "country.name", "country.name") should result in "Timor Leste" countrycode("Korea, Democratic Republic of", "country.name", "country.name") should result in "Democratic People's Republic of Korea"