vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.
https://vincentarelbundock.github.io/countrycode
GNU General Public License v3.0
342 stars 83 forks source link

French + Italian regexes and minor corrections to German regexes and country names #303

Closed vincentarelbundock closed 2 years ago

vincentarelbundock commented 2 years ago

Samuel Meichtry created and sent me a spreadsheet via email. It includes:

  1. country.name.fr.regex
  2. country.name.it.regex
  3. Minor fixes to country.name.de.regex
  4. Minor fixes to country.name.de

I would like us to merge these new features into dictionary/data_regex.csv, but have not checked the spreadsheet at all yet. We have to diff it to check the changes manually before merging.

Busy this week, so not exactly sure when I'll have time, so opening this issue to record it.

data_regex.csv

vincentarelbundock commented 2 years ago

The author confirmed by email to Vincent Arel-Bundock on April 28th 2022 that he wrote the content of the file and that he is willing to release it under GPL3

vincentarelbundock commented 2 years ago

I wrote an email to the author pointing out that many regexes match multiple countries, and that others do not match any countries at all. Waiting for response.

vincentarelbundock commented 2 years ago

FYI, the development version of countrycode can now convert French and Italian country names using regexes. (This is super cool!)

I plan to release to CRAN in the next few days.

library(countrycode)

countrycode(c("Algérie", "États-Unis d'Amérique"), "country.name.fr", "iso3c")
#> [1] "DZA" "USA"

countrycode(c("DZA", "USA"), "iso3c", "country.name.fr")
#> [1] "Algérie"    "États-Unis"

countrycode(c("Stati Uniti", "Moldavia"), "country.name.it", "iso3c")
#> [1] "USA" "MDA"

countrycode(c("United States of America", "Moldova"), "country.name.en", "country.name.it")
#> [1] "Stati Uniti" "Moldavia"

countrycode(c("United States of America", "Moldova"), "country.name.en", "iso3c")
#> [1] "USA" "MDA"
vincentarelbundock commented 2 years ago

@cjyetman Do you mind if I ask Samuel if he wants to be added to DESCRIPTION as a "ctb"? Writing a full set of Italian and French regexes was a lot of work, and he was very responsive when I asked him to fix problems with the first draft. He certainly spent several annoying hours on this...

cjyetman commented 2 years ago

Of course! As in of course I do NOT mind.

cjyetman commented 2 years ago

A mention in News would make sense as well

vincentarelbundock commented 2 years ago

package countrycode_1.4.0.tar.gz is on its way to CRAN.