vincentarelbundock / countrycode

R package: Convert country names and country codes. Assigns region descriptors.
https://vincentarelbundock.github.io/countrycode
GNU General Public License v3.0
342 stars 83 forks source link

Error in SanityCheck(tmp) : Assertion on 'any(is.na(dataset[["country.name.en.regex"]]))' failed: Must be FALSE. #285

Closed iago-pssjd closed 3 years ago

iago-pssjd commented 3 years ago

I get this error without adding any new data, just using the fork. Could it be due to the R version?

Further I get Some values were not matched unambiguously: Åland Islandsand also with Åland Islands, Curaçao, Réunion, São Tomé & Príncipe, St. Barthélemy. This is with data which is already in codelist. In a new dictionary, am I doing right if the country variable coincides with country.name.en or should I use another codelist variable for country.

R -f dictionary/build.R

R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R es un software libre y viene sin GARANTIA ALGUNA.
Usted puede redistribuirlo bajo ciertas circunstancias.
Escriba 'license()' o 'licence()' para detalles de distribucion.

R es un proyecto colaborativo con muchos contribuyentes.
Escriba 'contributors()' para obtener más información y
'citation()' para saber cómo citar R o paquetes de R en publicaciones.

Escriba 'demo()' para demostraciones, 'help()' para el sistema on-line de ayuda,
o 'help.start()' para abrir el sistema de ayuda HTML con su navegador.
Escriba 'q()' para salir de R.

> setwd(here::here())
> source('dictionary/utilities.R')
here() starts at P:/projects/countrycode
-- Attaching packages --------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.1 --
v ggplot2 3.3.5     v purrr   0.3.4
v tibble  3.1.2     v dplyr   1.0.7
v tidyr   1.1.3     v stringr 1.4.0
v readr   1.4.0     v forcats 0.5.1
[conflicted] Will prefer dplyr::filter over any other package
[conflicted] Will prefer dplyr::select over any other package
Warning message:
In eval(ei, envir) : Running in a non-UTF-8 locale!
>
>
> ##################
> #  availability  #
> ##################
>
> scrapers <- Sys.glob('dictionary/get_*.R')
> datasets <- Sys.glob('dictionary/data_*.csv')
> datasets <- datasets[datasets != 'dictionary/data_regex.csv']
>
> # missing scrapers and datasets
> tokens_datasets <- str_replace_all(datasets, '.*data_|.csv', '')
> tokens_scrapers <- str_replace_all(scrapers, '.*get_|.R', '')
> tokens_scrapers <- setdiff(tokens_scrapers, "countryname_dict")
>
> if (length(setdiff(tokens_scrapers, tokens_datasets)) > 0) {
+     msg <- paste(setdiff(tokens_scrapers, tokens_datasets), collapse = ', ')
+     msg <- paste('Missing datasets:', msg)
+     stop(msg)
+ }
>
> if (length(setdiff(tokens_datasets, tokens_scrapers)) > 0) {
+     msg <- paste(setdiff(tokens_datasets, tokens_scrapers), collapse = ', ')
+     msg <- paste('Missing scrapers:', msg)
+     warning(msg)
+ }
Warning message:
Missing scrapers: aviation, icao, imf, ownop, regions, unpd
>
>
> ###############
> #  load data  #
> ###############
>
> dat <- list()
> dat$regex <- read_csv('dictionary/data_regex.csv', col_types = cols(), progress = FALSE)
>
> message('Load:')
Load:
> for (i in seq_along(datasets)) {
+     message('  ', tokens_datasets[i])
+     tmp <- read_csv(datasets[i], col_types = cols(), na = "", progress = FALSE) %>%
+            mutate(country.name.en.regex = CountryToRegex(country)) %>%
+            select(-country)
+     SanityCheck(tmp)
+     dat[[tokens_datasets[i]]] <- tmp
+ }
  aviation
Error in SanityCheck(tmp) :
  Assertion on 'any(is.na(dataset[["country.name.en.regex"]]))' failed: Must be FALSE.
Calls: SanityCheck -> <Anonymous> -> makeAssertion -> mstop
Además: Warning message:
Problem with `mutate()` column `country.name.en.regex`.
i `country.name.en.regex = CountryToRegex(country)`.
i Some values were not matched unambiguously: Åland Islands
vincentarelbundock commented 3 years ago

You are probably running this with R on Windows, which does not handle Unicode characters like the first letter of Aland properly.

iago-pssjd commented 3 years ago

Sure, thanks!

vincentarelbundock commented 3 years ago

And just to be explicit (for future users and onlookers) the build.R is not user facing. It is a purely internal build script, meant mainly to be running on the developers' machines. So we can't offer support for that.

There are easy "supported" ways to add custom dictionaries: the ˋcustom_dictargument and the ˋcountrycode_factory argument