vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.
https://vincentarelbundock.github.io/countrycode
GNU General Public License v3.0
346 stars 84 forks source link

French Guyana vs. Guyana #280

Closed iamgyang closed 3 years ago

iamgyang commented 3 years ago

When using countrycode on "French Guyana", it outputs "GUY", but it should be "GUF", as French Guiana and Guyana are different countries (technically French Guiana is part of France).

GUF French Guiana GUY Guyana

NilsEnevoldsen commented 3 years ago

@iamgyang, is "French Guyana" always a typo, or is there a context in which it is correct?

Either way, it appears to be a common enough spelling and an unambiguous enough description that I agree French Guyana should match GUF rather than GUY.

At the same time, we should change Dutch Guyana to match SUR instead of GUY for the same reason.

It appears that British Guiana already matches GUY, so no change needed there.

The proper spellings (French Guiana, Dutch Guiana/Suriname, British Guyana/Guyana) are already all matched correctly.

vincentarelbundock commented 3 years ago

Thanks both! I pushed a couple commits which should hopefully fix this. You can check the regex changes and the new tests here:

https://github.com/vincentarelbundock/countrycode/commit/97f82fb374fb3c4404665f0cc539deeb6e66d01d

NilsEnevoldsen commented 3 years ago

@vincentarelbundock I would add a test that Guiana matches NA and change the GUY regex accordingly. Perhaps it's acceptable for the German regex -- I don't know -- but in English I think plain Guiana is too ambiguous… unlike British Guiana.

iamgyang commented 3 years ago

@iamgyang, is "French Guyana" always a typo, or is there a context in which it is correct?

I do think it's a typo, but don't know enough about spellings to be for sure. I found it spelled this way in an official data source from FAO.

vincentarelbundock commented 3 years ago

@vincentarelbundock I would add a test that Guiana matches NA and change the GUY regex accordingly. Perhaps it's acceptable for the German regex -- I don't know -- but in English I think plain Guiana is too ambiguous… unlike British Guiana.

Got it, thanks.