Open AltfunsMA opened 4 years ago
I have spent the last 3 hours trying to plot a complete map of Africa!
The fact that X = ne_countries(country=...)
does not use the same strings as X$name
or even X$name_long
is highly unexpected behaviour.
(@jessexknight it would be probably easier just to download the world country map and then subset it by the continent name)
Thanks but I my data are for Sub-Saharan Africa only, and in general I think it is reasonable to select countries arbitrarily by name.
I realize the naming is sometimes inconsistent in the natural earth source files (the use of "united states" versus "united states of america". However, the subset of names useful for the country argument in ne_countries()
is the admin field in the natural earth source files. One way to access those names is:
df <- ne_countries(continent = "Africa", returnclass = "sf")
df$admin
@andysouth The country argument seems inconsistent with the arguments continent, geounit, and sovereignty which align with field names in the NE data. Would you be open to a pull request that either includes a argument that is explicitly named admin; or update the function documentation to explicitly indicate what variable the argument is filtering on. It isn't obvious right away what country is filtering on unless you look at the function code.
If I may, improving on the example of lubridate::OlsonNames()
, it'd be great to have a function with a straightforward name like CountryNames()
that outputs the full list of the admin field and reference that in the documentation under the country
argument for ne_countries
.
Even better could be a GeoNames()
that outputs a dataframe with columns: countries
(even if it's called "admin" in the original data), continent
and sovereignty
(plus any other potentially useful ones) that can be used in ne_countries()
and other functions. That's what I thought I was getting with countries110
.
Having an argument called admin
alongside the argument country
in ne_countries
will almost certainly be confusing.
Use cases include when you have a vector of say 100 country names not necessarily attached to a continent or other higher order classification. Quickly checking whether all the names match and where they match will be easy, e.g., do I match Taiwan, Republic of China, Chinese Taipei, or Taiwan (China) under sovereignty or country/admin?
I'd volunteer to implement this as it doesn't seem terribly complicated... but I'm hardly a professional programmer and don't know the package particularly well!
Thanks but I my data are for Sub-Saharan Africa only, and in general I think it is reasonable to select countries arbitrarily by name.
{rnaturalearth}
is an excellent package (thanks to @andysouth) with the most complete information out there about countries. With this in hand, you can just filter out what you want. Would that below be a satisfactory result for you?
if (!require("tidyverse")) install.packages("tidyverse")
#> Loading required package: tidyverse
if (!require("sf")) install.packages("sf")
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.2
if (!require("rnaturalearth")) install.packages("rnaturalearth")
#> Loading required package: rnaturalearth
world_data <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")
world_data %>%
filter_all(., any_vars(str_detect(., "Sub-Saharan"))) %>%
ggplot() +
geom_sf()
#> Warning in stri_detect_regex(string, pattern, negate = negate, opts_regex =
#> opts(pattern)): argument is not an atomic vector; coercing
Created on 2020-08-04 by the reprex package (v0.3.0)
I made a function to optimize that search for me (check here). It is not perfect, but works.
Thanks All,
Useful discussion.
I feel your pain @AltfunsMA, @jessexknight, I've been there many times. Country naming is a minefield, I would love to have the time to make better solutions.
Great stuff @kguidonimartins using filter_all
, any_vars
and str_detect
is a genius way to subset Sub-Sahran countries, I wouldn't have known how to do that.
Incidentally @jessexknight we are part way through dedicated admin boundary solutions for Africa that may be of interest to you. Input welcome : https://github.com/afrimapr/afriadmin
Let me think about these things this week.
@kguidonimartins , superdplyr
to the rescue! XD
@andysouth , the package is great as it is; these are really minor tweaks! Thanks so much for your work!
Hi! I'm here because one of my students actually was having problems trying to get a list of countries. It would be great if the documentation pointed to a list of valid country/state names, etc. Maybe to the naturalearth site? For example, this page: https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-details/ ?
That is a good idea:
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.7.1, PROJ 9.2.0; sf_use_s2() is TRUE
df <- read_sf("/vsizip/vsicurl/https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_sovereignty.zip")
head(df)
#> Simple feature collection with 6 features and 168 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -109.4537 ymin: -55.9185 xmax: 140.9776 ymax: 7.35578
#> Geodetic CRS: WGS 84
#> # A tibble: 6 × 169
#> featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE TLC
#> <chr> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
#> 1 Admin-0 sove… 5 2 Indonesia IDN 0 2 Sove… 1
#> 2 Admin-0 sove… 5 3 Malaysia MYS 0 2 Sove… 1
#> 3 Admin-0 sove… 0 2 Chile CHL 0 2 Sove… 1
#> 4 Admin-0 sove… 0 3 Bolivia BOL 0 2 Sove… 1
#> 5 Admin-0 sove… 0 2 Peru PER 0 2 Sove… 1
#> 6 Admin-0 sove… 0 2 Argentina ARG 0 2 Sove… 1
#> # ℹ 160 more variables: ADMIN <chr>, ADM0_A3 <chr>, GEOU_DIF <int>,
#> # GEOUNIT <chr>, GU_A3 <chr>, SU_DIF <int>, SUBUNIT <chr>, SU_A3 <chr>,
#> # BRK_DIFF <int>, NAME <chr>, NAME_LONG <chr>, BRK_A3 <chr>, BRK_NAME <chr>,
#> # BRK_GROUP <chr>, ABBREV <chr>, POSTAL <chr>, FORMAL_EN <chr>,
#> # FORMAL_FR <chr>, NAME_CIAWF <chr>, NOTE_ADM0 <chr>, NOTE_BRK <chr>,
#> # NAME_SORT <chr>, NAME_ALT <chr>, MAPCOLOR7 <int>, MAPCOLOR8 <int>,
#> # MAPCOLOR9 <int>, MAPCOLOR13 <int>, POP_EST <dbl>, POP_RANK <int>, …
names(df)
#> [1] "featurecla" "scalerank" "LABELRANK" "SOVEREIGNT" "SOV_A3"
#> [6] "ADM0_DIF" "LEVEL" "TYPE" "TLC" "ADMIN"
#> [11] "ADM0_A3" "GEOU_DIF" "GEOUNIT" "GU_A3" "SU_DIF"
#> [16] "SUBUNIT" "SU_A3" "BRK_DIFF" "NAME" "NAME_LONG"
#> [21] "BRK_A3" "BRK_NAME" "BRK_GROUP" "ABBREV" "POSTAL"
#> [26] "FORMAL_EN" "FORMAL_FR" "NAME_CIAWF" "NOTE_ADM0" "NOTE_BRK"
#> [31] "NAME_SORT" "NAME_ALT" "MAPCOLOR7" "MAPCOLOR8" "MAPCOLOR9"
#> [36] "MAPCOLOR13" "POP_EST" "POP_RANK" "POP_YEAR" "GDP_MD"
#> [41] "GDP_YEAR" "ECONOMY" "INCOME_GRP" "FIPS_10" "ISO_A2"
#> [46] "ISO_A2_EH" "ISO_A3" "ISO_A3_EH" "ISO_N3" "ISO_N3_EH"
#> [51] "UN_A3" "WB_A2" "WB_A3" "WOE_ID" "WOE_ID_EH"
#> [56] "WOE_NOTE" "ADM0_ISO" "ADM0_DIFF" "ADM0_TLC" "ADM0_A3_US"
#> [61] "ADM0_A3_FR" "ADM0_A3_RU" "ADM0_A3_ES" "ADM0_A3_CN" "ADM0_A3_TW"
#> [66] "ADM0_A3_IN" "ADM0_A3_NP" "ADM0_A3_PK" "ADM0_A3_DE" "ADM0_A3_GB"
#> [71] "ADM0_A3_BR" "ADM0_A3_IL" "ADM0_A3_PS" "ADM0_A3_SA" "ADM0_A3_EG"
#> [76] "ADM0_A3_MA" "ADM0_A3_PT" "ADM0_A3_AR" "ADM0_A3_JP" "ADM0_A3_KO"
#> [81] "ADM0_A3_VN" "ADM0_A3_TR" "ADM0_A3_ID" "ADM0_A3_PL" "ADM0_A3_GR"
#> [86] "ADM0_A3_IT" "ADM0_A3_NL" "ADM0_A3_SE" "ADM0_A3_BD" "ADM0_A3_UA"
#> [91] "ADM0_A3_UN" "ADM0_A3_WB" "CONTINENT" "REGION_UN" "SUBREGION"
#> [96] "REGION_WB" "NAME_LEN" "LONG_LEN" "ABBREV_LEN" "TINY"
#> [101] "HOMEPART" "MIN_ZOOM" "MIN_LABEL" "MAX_LABEL" "LABEL_X"
#> [106] "LABEL_Y" "NE_ID" "WIKIDATAID" "NAME_AR" "NAME_BN"
#> [111] "NAME_DE" "NAME_EN" "NAME_ES" "NAME_FA" "NAME_FR"
#> [116] "NAME_EL" "NAME_HE" "NAME_HI" "NAME_HU" "NAME_ID"
#> [121] "NAME_IT" "NAME_JA" "NAME_KO" "NAME_NL" "NAME_PL"
#> [126] "NAME_PT" "NAME_RU" "NAME_SV" "NAME_TR" "NAME_UK"
#> [131] "NAME_UR" "NAME_VI" "NAME_ZH" "NAME_ZHT" "FCLASS_ISO"
#> [136] "TLC_DIFF" "FCLASS_TLC" "FCLASS_US" "FCLASS_FR" "FCLASS_RU"
#> [141] "FCLASS_ES" "FCLASS_CN" "FCLASS_TW" "FCLASS_IN" "FCLASS_NP"
#> [146] "FCLASS_PK" "FCLASS_DE" "FCLASS_GB" "FCLASS_BR" "FCLASS_IL"
#> [151] "FCLASS_PS" "FCLASS_SA" "FCLASS_EG" "FCLASS_MA" "FCLASS_PT"
#> [156] "FCLASS_AR" "FCLASS_JP" "FCLASS_KO" "FCLASS_VN" "FCLASS_TR"
#> [161] "FCLASS_ID" "FCLASS_PL" "FCLASS_GR" "FCLASS_IT" "FCLASS_NL"
#> [166] "FCLASS_SE" "FCLASS_BD" "FCLASS_UA" "geometry"
Created on 2023-11-15 with reprex v2.0.2
To my surprise
ne_countries(country = "United States")
orne_countries(country = "USA")
do not work butne_countries(country = "United Kingdom")
does.I searched
countries110
and found "United States" as a country descriptor so I thought it was a bug, rather than just the fact that I had to typene_country(country = "United States of America")
. It is unclear which field should be used for specifying country names, and it seems odd to require the full formal name for the USA but not for the United Kingdom of Great Britain and Northern Ireland or the Commonwealth of Australia, etc.Thanks for a great package!