Unexpected behavior of medium size country polygon download / while small & large resolution work as advertised #105

Open jlacko opened 6 months ago

jlacko commented 6 months ago

The ne_countries() function behaves unexpectedly in case of downloading a country polygon in medium (1:50M) resolution. This feels strange, since the very same piece of code in small or large resolution works as expected.

For a reproducible example consider this piece of code:

# this will crash...
rnaturalearth::ne_countries(country = "Czechia", scale = "medium") 

# ... while these two will work perfectly OK
rnaturalearth::ne_countries(country = "Czechia", scale = "small") 
rnaturalearth::ne_countries(country = "Czechia", scale = "large") 
PMassicotte commented 6 months ago

Works fine on my side

rnaturalearth::ne_countries(country = "Czechia", scale = "medium")
#> Simple feature collection with 1 feature and 168 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 12.08975 ymin: 48.57622 xmax: 18.83223 ymax: 51.03779
#> Geodetic CRS:  WGS 84
#>          featurecla scalerank labelrank sovereignt sov_a3 adm0_dif level
#> 185 Admin-0 country         1         5    Czechia    CZE        0     2
#>                  type tlc   admin adm0_a3 geou_dif geounit gu_a3 su_dif subunit
#> 185 Sovereign country   1 Czechia     CZE        0 Czechia   CZE      0 Czechia
#>     su_a3 brk_diff    name      name_long brk_a3 brk_name brk_group abbrev
#> 185   CZE        0 Czechia Czech Republic    CZE  Czechia      <NA>    Cz.
#>     postal      formal_en             formal_fr name_ciawf note_adm0 note_brk
#> 185     CZ Czech Republic la République tchèque    Czechia      <NA>     <NA>
#>     name_sort name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13  pop_est
#> 185   Czechia    Česko         1         1         2          6 10669709
#>     pop_rank pop_year gdp_md gdp_year                    economy
#> 185       14     2019 250680     2019 2. Developed region: nonG7
#>               income_grp fips_10 iso_a2 iso_a2_eh iso_a3 iso_a3_eh iso_n3
#> 185 1. High income: OECD      EZ     CZ        CZ    CZE       CZE    203
#>     iso_n3_eh un_a3 wb_a2 wb_a3   woe_id woe_id_eh                   woe_note
#> 185       203   203    CZ   CZE 23424810  23424810 Exact WOE match as country
#>     adm0_iso adm0_diff adm0_tlc adm0_a3_us adm0_a3_fr adm0_a3_ru adm0_a3_es
#> 185      CZE      <NA>      CZE        CZE        CZE        CZE        CZE
#>     adm0_a3_cn adm0_a3_tw adm0_a3_in adm0_a3_np adm0_a3_pk adm0_a3_de
#> 185        CZE        CZE        CZE        CZE        CZE        CZE
#>     adm0_a3_gb adm0_a3_br adm0_a3_il adm0_a3_ps adm0_a3_sa adm0_a3_eg
#> 185        CZE        CZE        CZE        CZE        CZE        CZE
#>     adm0_a3_ma adm0_a3_pt adm0_a3_ar adm0_a3_jp adm0_a3_ko adm0_a3_vn
#> 185        CZE        CZE        CZE        CZE        CZE        CZE
#>     adm0_a3_tr adm0_a3_id adm0_a3_pl adm0_a3_gr adm0_a3_it adm0_a3_nl
#> 185        CZE        CZE        CZE        CZE        CZE        CZE
#>     adm0_a3_se adm0_a3_bd adm0_a3_ua adm0_a3_un adm0_a3_wb continent region_un
#> 185        CZE        CZE        CZE        -99        -99    Europe    Europe
#>          subregion             region_wb name_len long_len abbrev_len tiny
#> 185 Eastern Europe Europe & Central Asia        7       14          3  -99
#>     homepart min_zoom min_label max_label  label_x  label_y      ne_id
#> 185        1        0         4         9 15.37755 49.88236 1159320535
#>     wikidataid name_ar      name_bn    name_de        name_en         name_es
#> 185       Q213  التشيك চেক প্রজাতন্ত্র Tschechien Czech Republic República Checa
#>       name_fa  name_fr name_el name_he   name_hi    name_hu       name_id
#> 185 جمهوری چک Tchéquie  Τσεχία   צ'כיה चेक गणराज्य Csehország Republik Ceko
#>             name_it name_ja name_ko  name_nl name_pl name_pt name_ru  name_sv
#> 185 Repubblica Ceca  チェコ    체코 Tsjechië  Czechy Chéquia   Чехия Tjeckien
#>             name_tr name_uk     name_ur      name_vi name_zh   name_zht
#> 185 Çek Cumhuriyeti   Чехія چیک جمہوریہ Cộng hòa Séc    捷克 捷克共和國
#>          fclass_iso tlc_diff      fclass_tlc fclass_us fclass_fr fclass_ru
#> 185 Admin-0 country     <NA> Admin-0 country      <NA>      <NA>      <NA>
#>     fclass_es fclass_cn fclass_tw fclass_in fclass_np fclass_pk fclass_de
#> 185      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>     fclass_gb fclass_br fclass_il fclass_ps fclass_sa fclass_eg fclass_ma
#> 185      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>     fclass_pt fclass_ar fclass_jp fclass_ko fclass_vn fclass_tr fclass_id
#> 185      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>     fclass_pl fclass_gr fclass_it fclass_nl fclass_se fclass_bd fclass_ua
#> 185      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>
#>                           geometry
#> 185 MULTIPOLYGON (((18.83223 49...

Session info
PMassicotte commented 6 months ago

I think you might want to update rnaturalearthdata

jlacko commented 6 months ago

That could possibly be the issue, although I am using the latest CRAN version of both rnaturalearth and rnaturalearthdata. On a deeper investigation I see this:

setdiff(rnaturalearth::ne_countries(scale = "medium")$admin,
        rnaturalearth::ne_countries(scale = "small")$admin)
#  [1] "Aruba"                                    "Anguilla"                                
#  [3] "Aland"                                    "Andorra"                                 
#  [5] "American Samoa"                           "Ashmore and Cartier Islands"             
#  [7] "Antigua and Barbuda"                      "Bahrain"                                 
#  [9] "Saint Barthelemy"                         "Bermuda"                                 
# [11] "Barbados"                                 "Republic of Congo"                       
# [13] "Cook Islands"                             "Comoros"                                 
# [15] "Cape Verde"                               "Curaçao"                                 
# [17] "Cayman Islands"                           "Czech Republic"                          
# [19] "Dominica"                                 "Faroe Islands"                           
# [21] "Federated States of Micronesia"           "Guernsey"                                
# [23] "Guinea Bissau"                            "Grenada"                                 
# [25] "Guam"                                     "Hong Kong S.A.R."                        
# [27] "Heard Island and McDonald Islands"        "Isle of Man"                             
# [29] "Indian Ocean Territories"                 "British Indian Ocean Territory"          
# [31] "Jersey"                                   "Siachen Glacier"                         
# [33] "Kiribati"                                 "Saint Kitts and Nevis"                   
# [35] "Saint Lucia"                              "Liechtenstein"                           
# [37] "Macao S.A.R"                              "Saint Martin"                            
# [39] "Monaco"                                   "Maldives"                                
# [41] "Marshall Islands"                         "Macedonia"                               
# [43] "Malta"                                    "Northern Mariana Islands"                
# [45] "Montserrat"                               "Mauritius"                               
# [47] "Norfolk Island"                           "Niue"                                    
# [49] "Nauru"                                    "Pitcairn Islands"                        
# [51] "Palau"                                    "French Polynesia"                        
# [53] "Singapore"                                "South Georgia and South Sandwich Islands"
# [55] "Saint Helena"                             "San Marino"                              
# [57] "Saint Pierre and Miquelon"                "Sao Tome and Principe"                   
# [59] "Swaziland"                                "Sint Maarten"                            
# [61] "Seychelles"                               "Turks and Caicos Islands"                
# [63] "Tonga"                                    "Vatican"                                 
# [65] "Saint Vincent and the Grenadines"         "British Virgin Islands"                  
# [67] "United States Virgin Islands"             "Wallis and Futuna"                       
# [69] "Samoa"                                

which is kind of unexpected... but not exactly wrong, as Czechia is listed, but under Czech Republic name (number 18).

But if you think it is OK on a standard installation then feel free to close the issue - I can sidestep the issue very easily, just wanted to give you heads up in case the problem pops up in a more serious use case than was mine.

PMassicotte commented 6 months ago

Thank you for reporting.

I just downloaded the data directly on naturalearth and it looks like it is coming from the source data:

#> Linking to GEOS 3.12.1, GDAL 3.8.0, PROJ 9.2.0; sf_use_s2() is TRUE

small <- st_read("/vsizip/vsicurl/")
#> Reading layer `ne_110m_admin_0_countries' from data source 
#>   `/vsizip/vsicurl/' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 177 features and 168 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
#> Geodetic CRS:  WGS 84

medium <- st_read("/vsizip/vsicurl/")
#> Reading layer `ne_50m_admin_0_countries' from data source 
#>   `/vsizip/vsicurl/' 
#>   using driver `ESRI Shapefile'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, : GDAL
#> Message 1:
#> /vsizip/vsicurl/
#> contains polygon(s) with rings with invalid winding order. Autocorrecting them,
#> but that shapefile should be corrected using ogr2ogr for example.
#> Simple feature collection with 242 features and 168 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -180 ymin: -89.99893 xmax: 180 ymax: 83.59961
#> Geodetic CRS:  WGS 84

setdiff(small$ADMIN, medium$ADMIN)
#> character(0)
setdiff(medium$ADMIN, small$ADMIN)
#>  [1] "Vatican"                           "Federated States of Micronesia"   
#>  [3] "Marshall Islands"                  "Northern Mariana Islands"         
#>  [5] "United States Virgin Islands"      "Guam"                             
#>  [7] "American Samoa"                    "South Georgia and the Islands"    
#>  [9] "British Indian Ocean Territory"    "Saint Helena"                     
#> [11] "Pitcairn Islands"                  "Anguilla"                         
#> [13] "Cayman Islands"                    "Bermuda"                          
#> [15] "British Virgin Islands"            "Turks and Caicos Islands"         
#> [17] "Montserrat"                        "Jersey"                           
#> [19] "Guernsey"                          "Isle of Man"                      
#> [21] "Tonga"                             "Singapore"                        
#> [23] "Seychelles"                        "São Tomé and Principe"            
#> [25] "San Marino"                        "Samoa"                            
#> [27] "Saint Vincent and the Grenadines"  "Saint Lucia"                      
#> [29] "Saint Kitts and Nevis"             "Palau"                            
#> [31] "Niue"                              "Cook Islands"                     
#> [33] "Aruba"                             "Curaçao"                          
#> [35] "Nauru"                             "Monaco"                           
#> [37] "Mauritius"                         "Malta"                            
#> [39] "Maldives"                          "Liechtenstein"                    
#> [41] "Kiribati"                          "Grenada"                          
#> [43] "Saint Pierre and Miquelon"         "Wallis and Futuna"                
#> [45] "Saint Martin"                      "Saint Barthelemy"                 
#> [47] "French Polynesia"                  "Aland"                            
#> [49] "Dominica"                          "Faroe Islands"                    
#> [51] "Comoros"                           "Macao S.A.R"                      
#> [53] "Hong Kong S.A.R."                  "Cabo Verde"                       
#> [55] "Barbados"                          "Bahrain"                          
#> [57] "Indian Ocean Territories"          "Heard Island and McDonald Islands"
#> [59] "Norfolk Island"                    "Ashmore and Cartier Islands"      
#> [61] "Antigua and Barbuda"               "Andorra"                          
#> [63] "Siachen Glacier"                   "Sint Maarten"                     
#> [65] "Tuvalu"

