vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.
https://vincentarelbundock.github.io/countrycode
GNU General Public License v3.0
346 stars 84 forks source link

Feature Request: Currency code ISO 4217 #240

Closed fr4nk3nst3in closed 4 years ago

fr4nk3nst3in commented 4 years ago

Is it possible to integrate currency names and iso 4217 code as additional destination option?

https://www.currency-iso.org/en/home/tables/table-a1.html

vincentarelbundock commented 4 years ago

Thanks for the suggestion. I couldn't figure out if this was an official source, so I wrote a coupl elines of code to scrape the IBAN website:

https://gist.github.com/vincentarelbundock/c94e29d575a1f33d14b7423c15937613

If you run this code, you'll see a dups object. That data.frame includes a list of countries with multiple associated currencies. How do you suggest we resolve those?

cjyetman commented 4 years ago

That is a weird site, but the more official looking one (https://www.iso.org/iso-4217-currency-codes.html) points directly to it. 🤷

vincentarelbundock commented 4 years ago

wow, OK. Let's use that spreadsheet, then. Still not sure what to do with many-to-one.

Maybe it's time for a real solution? But do I have the energy? Or do I just want to go to sleep until we get a vaccine? Those are the questions...

fr4nk3nst3in commented 4 years ago

I was redirected to the source site from the ISO site.

https://www.iso.org/iso-4217-currency-codes.html

The name of UK is an indication that the data come from the XML https://www.currency-iso.org/dam/downloads/lists/list_one.xml as I was provided before

UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND (THE) Pound Sterling GBP 826

In the XML File there is an attribute 'IsFund'. In most cases that resolves duplicates, if it's missing.

cjyetman commented 4 years ago

Here's a list of countries that have multiple currency codes (excluding "funds"), which would need to be resolved before this is ready...

library(readxl)
library(dplyr)
library(countrycode)

url <- 'https://www.currency-iso.org/dam/downloads/lists/list_one.xls'
download.file(url, tmpxls <- tempfile(fileext = '.xls'), quiet = TRUE)

read_excel(tmpxls, skip = 3) %>% 
  filter(is.na(Fund)) %>%  # filter out "fund" currencies
  filter(!is.na(`Alphabetic Code`)) %>%   # filter out rows with no currency code (currently Antartica, Palestine, South Georgia)
  mutate(country.name = countrycode(ENTITY, 'country.name', 'country.name', warn = FALSE)) %>% 
  filter(!is.na(country.name)) %>%   # filter out unmatched countries (currently e.g. EU, IMF, ZZ08_Gold, etc.)
  mutate(`Numeric Code` = as.numeric(`Numeric Code`)) %>% 
  select(country.name, currency = Currency, iso4217c = `Alphabetic Code`,
         iso4217n = `Numeric Code`) %>% 
  filter(duplicated(country.name) | duplicated(country.name, fromLast = TRUE))
#> # A tibble: 16 x 4
#>    country.name currency           iso4217c iso4217n
#>    <chr>        <chr>              <chr>       <dbl>
#>  1 Bhutan       Indian Rupee       INR           356
#>  2 Bhutan       Ngultrum           BTN            64
#>  3 Cuba         Cuban Peso         CUP           192
#>  4 Cuba         Peso Convertible   CUC           931
#>  5 El Salvador  El Salvador Colon  SVC           222
#>  6 El Salvador  US Dollar          USD           840
#>  7 Haiti        Gourde             HTG           332
#>  8 Haiti        US Dollar          USD           840
#>  9 Lesotho      Loti               LSL           426
#> 10 Lesotho      Rand               ZAR           710
#> 11 Namibia      Namibia Dollar     NAD           516
#> 12 Namibia      Rand               ZAR           710
#> 13 Panama       Balboa             PAB           590
#> 14 Panama       US Dollar          USD           840
#> 15 Uruguay      Peso Uruguayo      UYU           858
#> 16 Uruguay      Unidad Previsional UYW           927

unlink(tmpxls)

This XLSX from this page lists amendments that have been made, which gives some insight into why some of these countries have multiple currencies.

For example, this page gives information about ISO 4217 Amendment Number 169, which explains why Uruguay has two currencies (and hints at which one should be removed for our purposes here)...

The Central Bank of Uruguay is applying for new Fund currency code.

The Unidad Previsional (UP) is a daily accounting unit that tracks changes to the nominal wage index. The value of UP is expressed in terms of Uruguayan Pesos per UP, with the initial value of one peso (UYU 1.00) on 04/30/2018. The institution responsible for the calculation and publication is the Instituto Nacional de Estadística (National Bureau of Statistics) according to Law 19,608.

fr4nk3nst3in commented 4 years ago

In my opinion, I would remove USD as a second currency. That leaves following five countries.

# A tibble: 10 x 4                                                                                                                   
   country.name currency           iso4217c iso4217n
   <chr>        <chr>              <chr>       <dbl>
 1 Bhutan       Indian Rupee       INR           356
 2 Bhutan       Ngultrum           BTN            64
 3 Cuba         Cuban Peso         CUP           192
 4 Cuba         Peso Convertible   CUC           931
 5 Lesotho      Loti               LSL           426
 6 Lesotho      Rand               ZAR           710
 7 Namibia      Namibia Dollar     NAD           516
 8 Namibia      Rand               ZAR           710
 9 Uruguay      Peso Uruguayo      UYU           858
10 Uruguay      Unidad Previsional UYW           927
fr4nk3nst3in commented 4 years ago

The World Bank http://wdi.worldbank.org/table/1.1

says:

2 Bhutan       Ngultrum           BTN            64
3 Cuba         Cuban Peso         CUP           192
5 Lesotho      Loti               LSL           426
7 Namibia      Namibia Dollar     NAD           516
9 Uruguay      Peso Uruguayo      UYU           858
vincentarelbundock commented 4 years ago

great. @cjyetman do you want to finish up writing the script? I'm happy to do it if you don't have time.

cjyetman commented 4 years ago

I'm not sure I agree with dropping USD from countries with multiple currencies without doing further research. For instance, USD is surely the official currency of El Salvador currently, and I'm pretty sure Colons haven't been printed since 2001, though old ones are still legally accepted. https://www.bcr.gob.sv/eng/index.php?option=com_content&view=category&id=109&Itemid=249

vincentarelbundock commented 4 years ago

How about leaving it NA? I'm not sure we want to continuously monitor and investigate every currency crisis in the world to make sure noone pegs to the USD (edit: i know this is more than a peg).

fr4nk3nst3in commented 4 years ago

El Salvador is only one country, where World Bank says that currency unit USD is.

fr4nk3nst3in commented 4 years ago

https://databank.worldbank.org/source/economic-fitness/preview/on

Under 'Metadata' there is list of many meta information about countries. There is a possibility to download the metadata.

изображение

cjyetman commented 4 years ago

My preference would be to do it right the first time (do the research and figure out the most appropriate choice for each one), and then wait for users to alert us to any changes that occur in the future (if the scraper doesn't automatically pick it up). But maybe I'm just a stick in the mud. Probably the most important factor in this context is what currency these countries use for their official reporting, e.g. users wanting to match financial data they have downloaded to something else... I think?

cjyetman commented 4 years ago

@fr4nk3nst3in we have a strong preference (maybe insistence?) on getting data from original sources. If the World Bank maintains its own currency code and they make it available in a scrapable form, we could use that for something like wb_curr, but for ISO 4217, we need to get that from ISO.

fr4nk3nst3in commented 4 years ago

@cjyetman I'm with you for getting the data from original source. I would use World Bank as indication for resolving the duplicates. The XLS for Metadata allows to do it automatically. World Bank is a good indication for choosing one of currencies. Anyway to set is as 'NA' is an option for me to. Then let user report which currency could be used. But maybe it's better to use currency from official reporting as you says.

cjyetman commented 4 years ago

I checked each of those pairs enough to satisfy my own OCD. PR is here #241

vincentarelbundock commented 4 years ago

Thanks again @fr4nk3nst3in for your suggestion and engagement. And thanks to @cjyetman for writing the get_* script.

This is now merged. You should be able to run:

remotes::install_github('vincentarelbundock/countrycode')
library(countrycode)
countrycode(c('France', 'Canada'), 'country.name', 'currency')
fr4nk3nst3in commented 4 years ago

Thank you @vincentarelbundock, @cjyetman for support and fast implementation of my request.