ropensci / rnaturalearth

An R package to hold and facilitate interaction with natural earth map data :earth_africa:
http://ropensci.github.io/rnaturalearth/
Other
214 stars 24 forks source link

Listing available country names #34

Open AltfunsMA opened 4 years ago

AltfunsMA commented 4 years ago

To my surprise ne_countries(country = "United States") or ne_countries(country = "USA") do not work but ne_countries(country = "United Kingdom") does.

I searched countries110 and found "United States" as a country descriptor so I thought it was a bug, rather than just the fact that I had to type ne_country(country = "United States of America"). It is unclear which field should be used for specifying country names, and it seems odd to require the full formal name for the USA but not for the United Kingdom of Great Britain and Northern Ireland or the Commonwealth of Australia, etc.

Thanks for a great package!

jessexknight commented 3 years ago

I have spent the last 3 hours trying to plot a complete map of Africa!

The fact that X = ne_countries(country=...) does not use the same strings as X$name or even X$name_long is highly unexpected behaviour.

Nowosad commented 3 years ago

(@jessexknight it would be probably easier just to download the world country map and then subset it by the continent name)

jessexknight commented 3 years ago

Thanks but I my data are for Sub-Saharan Africa only, and in general I think it is reasonable to select countries arbitrarily by name.

mps9506 commented 3 years ago

I realize the naming is sometimes inconsistent in the natural earth source files (the use of "united states" versus "united states of america". However, the subset of names useful for the country argument in ne_countries() is the admin field in the natural earth source files. One way to access those names is:

df <- ne_countries(continent = "Africa", returnclass = "sf")
df$admin

@andysouth The country argument seems inconsistent with the arguments continent, geounit, and sovereignty which align with field names in the NE data. Would you be open to a pull request that either includes a argument that is explicitly named admin; or update the function documentation to explicitly indicate what variable the argument is filtering on. It isn't obvious right away what country is filtering on unless you look at the function code.

AltfunsMA commented 3 years ago

If I may, improving on the example of lubridate::OlsonNames(), it'd be great to have a function with a straightforward name like CountryNames() that outputs the full list of the admin field and reference that in the documentation under the country argument for ne_countries.

Even better could be a GeoNames() that outputs a dataframe with columns: countries (even if it's called "admin" in the original data), continent and sovereignty (plus any other potentially useful ones) that can be used in ne_countries() and other functions. That's what I thought I was getting with countries110.

Having an argument called admin alongside the argument country in ne_countries will almost certainly be confusing.

Use cases include when you have a vector of say 100 country names not necessarily attached to a continent or other higher order classification. Quickly checking whether all the names match and where they match will be easy, e.g., do I match Taiwan, Republic of China, Chinese Taipei, or Taiwan (China) under sovereignty or country/admin?

I'd volunteer to implement this as it doesn't seem terribly complicated... but I'm hardly a professional programmer and don't know the package particularly well!

kguidonimartins commented 3 years ago

Thanks but I my data are for Sub-Saharan Africa only, and in general I think it is reasonable to select countries arbitrarily by name.

{rnaturalearth} is an excellent package (thanks to @andysouth) with the most complete information out there about countries. With this in hand, you can just filter out what you want. Would that below be a satisfactory result for you?

if (!require("tidyverse")) install.packages("tidyverse")
#> Loading required package: tidyverse
if (!require("sf")) install.packages("sf")
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.2
if (!require("rnaturalearth")) install.packages("rnaturalearth")
#> Loading required package: rnaturalearth

world_data <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf")

world_data %>%
  filter_all(., any_vars(str_detect(., "Sub-Saharan"))) %>%
  ggplot() +
  geom_sf()
#> Warning in stri_detect_regex(string, pattern, negate = negate, opts_regex =
#> opts(pattern)): argument is not an atomic vector; coercing

Created on 2020-08-04 by the reprex package (v0.3.0)

I made a function to optimize that search for me (check here). It is not perfect, but works.

andysouth commented 3 years ago

Thanks All,

Useful discussion.

I feel your pain @AltfunsMA, @jessexknight, I've been there many times. Country naming is a minefield, I would love to have the time to make better solutions.

Great stuff @kguidonimartins using filter_all , any_vars and str_detect is a genius way to subset Sub-Sahran countries, I wouldn't have known how to do that.

Incidentally @jessexknight we are part way through dedicated admin boundary solutions for Africa that may be of interest to you. Input welcome : https://github.com/afrimapr/afriadmin

Let me think about these things this week.

AltfunsMA commented 3 years ago

@kguidonimartins , superdplyr to the rescue! XD @andysouth , the package is great as it is; these are really minor tweaks! Thanks so much for your work!

eliocamp commented 7 months ago

Hi! I'm here because one of my students actually was having problems trying to get a list of countries. It would be great if the documentation pointed to a list of valid country/state names, etc. Maybe to the naturalearth site? For example, this page: https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-details/ ?

PMassicotte commented 7 months ago

That is a good idea:

library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.7.1, PROJ 9.2.0; sf_use_s2() is TRUE

df <- read_sf("/vsizip/vsicurl/https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_sovereignty.zip")

head(df)
#> Simple feature collection with 6 features and 168 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -109.4537 ymin: -55.9185 xmax: 140.9776 ymax: 7.35578
#> Geodetic CRS:  WGS 84
#> # A tibble: 6 × 169
#>   featurecla    scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE  TLC  
#>   <chr>             <int>     <int> <chr>      <chr>     <int> <int> <chr> <chr>
#> 1 Admin-0 sove…         5         2 Indonesia  IDN           0     2 Sove… 1    
#> 2 Admin-0 sove…         5         3 Malaysia   MYS           0     2 Sove… 1    
#> 3 Admin-0 sove…         0         2 Chile      CHL           0     2 Sove… 1    
#> 4 Admin-0 sove…         0         3 Bolivia    BOL           0     2 Sove… 1    
#> 5 Admin-0 sove…         0         2 Peru       PER           0     2 Sove… 1    
#> 6 Admin-0 sove…         0         2 Argentina  ARG           0     2 Sove… 1    
#> # ℹ 160 more variables: ADMIN <chr>, ADM0_A3 <chr>, GEOU_DIF <int>,
#> #   GEOUNIT <chr>, GU_A3 <chr>, SU_DIF <int>, SUBUNIT <chr>, SU_A3 <chr>,
#> #   BRK_DIFF <int>, NAME <chr>, NAME_LONG <chr>, BRK_A3 <chr>, BRK_NAME <chr>,
#> #   BRK_GROUP <chr>, ABBREV <chr>, POSTAL <chr>, FORMAL_EN <chr>,
#> #   FORMAL_FR <chr>, NAME_CIAWF <chr>, NOTE_ADM0 <chr>, NOTE_BRK <chr>,
#> #   NAME_SORT <chr>, NAME_ALT <chr>, MAPCOLOR7 <int>, MAPCOLOR8 <int>,
#> #   MAPCOLOR9 <int>, MAPCOLOR13 <int>, POP_EST <dbl>, POP_RANK <int>, …

names(df)
#>   [1] "featurecla" "scalerank"  "LABELRANK"  "SOVEREIGNT" "SOV_A3"    
#>   [6] "ADM0_DIF"   "LEVEL"      "TYPE"       "TLC"        "ADMIN"     
#>  [11] "ADM0_A3"    "GEOU_DIF"   "GEOUNIT"    "GU_A3"      "SU_DIF"    
#>  [16] "SUBUNIT"    "SU_A3"      "BRK_DIFF"   "NAME"       "NAME_LONG" 
#>  [21] "BRK_A3"     "BRK_NAME"   "BRK_GROUP"  "ABBREV"     "POSTAL"    
#>  [26] "FORMAL_EN"  "FORMAL_FR"  "NAME_CIAWF" "NOTE_ADM0"  "NOTE_BRK"  
#>  [31] "NAME_SORT"  "NAME_ALT"   "MAPCOLOR7"  "MAPCOLOR8"  "MAPCOLOR9" 
#>  [36] "MAPCOLOR13" "POP_EST"    "POP_RANK"   "POP_YEAR"   "GDP_MD"    
#>  [41] "GDP_YEAR"   "ECONOMY"    "INCOME_GRP" "FIPS_10"    "ISO_A2"    
#>  [46] "ISO_A2_EH"  "ISO_A3"     "ISO_A3_EH"  "ISO_N3"     "ISO_N3_EH" 
#>  [51] "UN_A3"      "WB_A2"      "WB_A3"      "WOE_ID"     "WOE_ID_EH" 
#>  [56] "WOE_NOTE"   "ADM0_ISO"   "ADM0_DIFF"  "ADM0_TLC"   "ADM0_A3_US"
#>  [61] "ADM0_A3_FR" "ADM0_A3_RU" "ADM0_A3_ES" "ADM0_A3_CN" "ADM0_A3_TW"
#>  [66] "ADM0_A3_IN" "ADM0_A3_NP" "ADM0_A3_PK" "ADM0_A3_DE" "ADM0_A3_GB"
#>  [71] "ADM0_A3_BR" "ADM0_A3_IL" "ADM0_A3_PS" "ADM0_A3_SA" "ADM0_A3_EG"
#>  [76] "ADM0_A3_MA" "ADM0_A3_PT" "ADM0_A3_AR" "ADM0_A3_JP" "ADM0_A3_KO"
#>  [81] "ADM0_A3_VN" "ADM0_A3_TR" "ADM0_A3_ID" "ADM0_A3_PL" "ADM0_A3_GR"
#>  [86] "ADM0_A3_IT" "ADM0_A3_NL" "ADM0_A3_SE" "ADM0_A3_BD" "ADM0_A3_UA"
#>  [91] "ADM0_A3_UN" "ADM0_A3_WB" "CONTINENT"  "REGION_UN"  "SUBREGION" 
#>  [96] "REGION_WB"  "NAME_LEN"   "LONG_LEN"   "ABBREV_LEN" "TINY"      
#> [101] "HOMEPART"   "MIN_ZOOM"   "MIN_LABEL"  "MAX_LABEL"  "LABEL_X"   
#> [106] "LABEL_Y"    "NE_ID"      "WIKIDATAID" "NAME_AR"    "NAME_BN"   
#> [111] "NAME_DE"    "NAME_EN"    "NAME_ES"    "NAME_FA"    "NAME_FR"   
#> [116] "NAME_EL"    "NAME_HE"    "NAME_HI"    "NAME_HU"    "NAME_ID"   
#> [121] "NAME_IT"    "NAME_JA"    "NAME_KO"    "NAME_NL"    "NAME_PL"   
#> [126] "NAME_PT"    "NAME_RU"    "NAME_SV"    "NAME_TR"    "NAME_UK"   
#> [131] "NAME_UR"    "NAME_VI"    "NAME_ZH"    "NAME_ZHT"   "FCLASS_ISO"
#> [136] "TLC_DIFF"   "FCLASS_TLC" "FCLASS_US"  "FCLASS_FR"  "FCLASS_RU" 
#> [141] "FCLASS_ES"  "FCLASS_CN"  "FCLASS_TW"  "FCLASS_IN"  "FCLASS_NP" 
#> [146] "FCLASS_PK"  "FCLASS_DE"  "FCLASS_GB"  "FCLASS_BR"  "FCLASS_IL" 
#> [151] "FCLASS_PS"  "FCLASS_SA"  "FCLASS_EG"  "FCLASS_MA"  "FCLASS_PT" 
#> [156] "FCLASS_AR"  "FCLASS_JP"  "FCLASS_KO"  "FCLASS_VN"  "FCLASS_TR" 
#> [161] "FCLASS_ID"  "FCLASS_PL"  "FCLASS_GR"  "FCLASS_IT"  "FCLASS_NL" 
#> [166] "FCLASS_SE"  "FCLASS_BD"  "FCLASS_UA"  "geometry"

Created on 2023-11-15 with reprex v2.0.2