ropensci / rnaturalearth

An R package to hold and facilitate interaction with natural earth map data :earth_africa:
http://ropensci.github.io/rnaturalearth/
Other
214 stars 24 forks source link

Non valid geometries? #78

Open MatthieuStigler opened 1 year ago

MatthieuStigler commented 1 year ago

It seems that there are some "invalid" geometries contained in ne_countries (both small and medium scales) detected by sf, both with the old R2 system and new S2?

Also, even on a subset of countries where sf::st_is_valid() does not detect issues, an operation like st_crop can lead to some issues?

library(rnaturalearth)
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE

world_S <- ne_countries(scale = "small", returnclass = "sf")
world_M <- ne_countries(scale = "medium", returnclass = "sf")

## check with S2
sf_use_s2(TRUE)
all(st_is_valid(world_S))
#> [1] FALSE
all(st_is_valid(world_M))
#> [1] FALSE

## check with R2
sf_use_s2(FALSE)
#> Spherical geometry (s2) switched off
all(st_is_valid(world_S))
#> [1] TRUE
all(st_is_valid(world_M))
#> [1] FALSE

## which ones?
sf_use_s2(TRUE)
#> Spherical geometry (s2) switched on
world_S$name[!st_is_valid(world_S)]
#> [1] "Sudan"  "Russia"
world_M$name[!st_is_valid(world_M)]
#> [1] "Antarctica" "Fiji"       "India"      "Russia"     "Sudan"     
#> [6] "S. Sudan"

sf_use_s2(FALSE)
#> Spherical geometry (s2) switched off
world_S$name[!st_is_valid(world_S)]
#> character(0)
world_M$name[!st_is_valid(world_M)]
#> [1] "India"

### issues even when valid
some_countries <- world_M |> 
  subset(continent == "Europe"| name %in% c("United States")) |> 
  subset(name !="Russia") |> 
  st_geometry() 

sf_use_s2(TRUE);all(st_is_valid(some_countries))
#> Spherical geometry (s2) switched on
#> [1] TRUE
sf_use_s2(FALSE);all(st_is_valid(some_countries))
#> Spherical geometry (s2) switched off
#> [1] TRUE

sf_use_s2(TRUE)
#> Spherical geometry (s2) switched on
some_countries |> 
  st_crop( c(xmin=-120, ymin=30, xmax=30, ymax= 55)) |>
  plot(border="blue", lwd=2)
plot(some_countries, add=TRUE)

Created on 2023-05-25 with reprex v2.0.2

PMassicotte commented 1 year ago

We are just providing an interface for the data provided by naturalearth. Maybe you could contact them? Meanwhile, you could try to fix issues with st_make_valid().

PMassicotte commented 1 year ago

Excellent!

MatthieuStigler commented 1 year ago

The problem is that the issue of polygons is quite complicated and I won't be able to follow-up much there, but anyway, it's also good just to mention the issue if other face it.

Unfortunately, the st_make_valid() approach does not work in the second case, possible because st_is_valid() does not flag those as invalid :-(

PMassicotte commented 1 year ago

Just tried to make it valid with terra but no luck:

terra::vect(world_S) |>
  terra::makeValid() |>
  st_as_sf() |>
  st_is_valid()
PMassicotte commented 1 year ago

I even tried to repair it directly with gdal:

ogr2ogr -f "ESRI Shapefile" output.shp world_s.shp -dialect sqlite -sql "SELECT ST_MakeValid(geometry) AS geometry FROM world_S" 
MatthieuStigler commented 1 year ago

thanks for trying!

I am wondering whether the problem comes from this (https://r-spatial.github.io/sf/articles/sf7.html)

Simple feature geometries should obey a ring direction too: exterior rings should be counter clockwise, interior (hole) rings should be clockwise, ...... many (legacy) datasets have wrong ring directions. With wrong ring directions, many things still work.

and the discussion on LINESTRING(-179 0,179 0), given that most datasets in rnaturalearth will contain polygons/lines close to 180? When reading the original datasets, did you check the point about check_ring_dir? Thanks!

mps9506 commented 1 year ago

Not really a fix but with s2 I think this helps out:

library(rnaturalearth)
#> Warning: package 'rnaturalearth' was built under R version 4.2.3
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE

worldS <- ne_countries(scale = "small", returnclass = "sf")

sf_use_s2(TRUE)

all(st_is_valid(worldS))
#> [1] FALSE
worldS$name[!st_is_valid(worldS)]
#> [1] "Sudan"  "Russia"

worldS_2 <- worldS$geometry |> 
  s2::as_s2_geography(check = F) |> 
  s2::s2_union() |> 
  s2::s2_rebuild() |> 
  st_as_sfc()

all(st_is_valid(worldS_2))
#> [1] TRUE
worldS$name[!st_is_valid(worldS_2)]
#> character(0)

Created on 2023-05-25 with reprex v2.0.2

MatthieuStigler commented 1 year ago

Thanks @mps9506 !!

Unfortunately, though the transformed polygons pass the st_is_valid test, the visual tests is suggesting there's a big problem with Russia spreading beyond its borders (political considerations aside):

library(rnaturalearth)
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.0; sf_use_s2() is TRUE

worldS <- ne_countries(scale = "small", returnclass = "sf")

worldS_2 <- worldS$geometry |> 
  s2::as_s2_geography(check = F) |> 
  s2::s2_union() |> 
  s2::s2_rebuild() |> 
  st_as_sfc()

all(st_is_valid(worldS_2))
#> [1] TRUE

plot(worldS_2[which(worldS$name=="Russia")])

Created on 2023-05-26 with reprex v2.0.2

MatthieuStigler commented 1 year ago

@PMassicotte it seems package s2 does also contain a version of naturalerth data, cf. s2::s2_data_countries(). It might be insightful to see how they addressed the invalid polygons part, especially given the future retirement of rgeos (which would make use of s2 the standard I believe?). Thanks!

PMassicotte commented 1 year ago

Looks like they write and read the data to fix the issue:

https://github.com/r-spatial/s2/blob/4fe0c97df91c98a24f379bfc3d8d4544f4835117/data-raw/countries.R#L7-L10

PMassicotte commented 1 year ago

One option could be to do it on the cron job used in rnaturalearthdata.

mps9506 commented 1 year ago

@MatthieuStigler I should know to plot the data by now!

library(rnaturalearth)
#> Warning: package 'rnaturalearth' was built under R version 4.2.3
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE

sf_use_s2()
#> [1] TRUE

worldS <- ne_countries(scale = "small", returnclass = "sf") 

worldS_2 <- worldS$geometry |>
  st_transform(crs = 3857) |> 
  st_make_valid() |>
  st_as_s2(check = F) |> 
  s2::s2_union() |> 
  s2::s2_rebuild(options = s2::s2_options(validate = TRUE)) |>
  st_as_sfc() |> 
  st_wrap_dateline()

all(st_is_valid(worldS_2))
#> [1] TRUE

plot(worldS_2[which(worldS$name=="Russia")])


plot(worldS_2)

Created on 2023-05-26 with reprex v2.0.2

MatthieuStigler commented 1 year ago

great, thanks @mps9506 !

PMassicotte commented 1 year ago

Just noticed with newer gdal and proj, it works for me:

library(rnaturalearth)
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.7.0, PROJ 9.2.0; sf_use_s2() is TRUE

worldS <- ne_countries(scale = "small", returnclass = "sf")

worldS_2 <- worldS$geometry
all(st_is_valid(worldS_2))
#> [1] FALSE

plot(worldS_2[which(worldS$name == "Russia")])

Created on 2023-05-28 with reprex v2.0.2