r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.33k stars 293 forks source link

st_union() has Inconsistent Join Behavior, Varies by CRS #2110

Closed k6adams closed 1 year ago

k6adams commented 1 year ago

Issue A colleague and I recently discovered that st_union() changes its' join behavior depending on the Coordinate Reference System used.

In following example, st_union() is tasked with joining two geometry columns.

Specifying rowwise() prior to the union resolves the error. So something about the way st_union() is specifying the join must be inconsistent.

I have not checked other CRSs.

Example

library(tidyverse)
library(sf)

pt_1 <- sf::st_point(x = c(-81, 29)) %>%
  sf::st_sfc(crs = sf::st_crs("EPSG:4326"))
pt_2 <- sf::st_point(x = c(-82, 30)) %>%
  sf::st_sfc(crs = sf::st_crs("EPSG:4326"))
pt_3 <- sf::st_point(x = c(-83, 31)) %>%
  sf::st_sfc(crs = sf::st_crs("EPSG:4326"))
pt_4 <- sf::st_point(x = c(-84, 32)) %>%
  sf::st_sfc(crs = sf::st_crs("EPSG:4326"))

geometry <- c(pt_1, pt_2, pt_3, pt_4)

data_4326 <- tibble::tibble(
  id = c(1,2,3,4),
  dist = units::as_units(c(10,10,10, 10), "km"),
  geometry = geometry
  ) %>%
  sf::st_as_sf(crs = "EPSG:4326")

buffers_4326 <- data_4326 %>%
  dplyr::mutate(
    buffer = sf::st_buffer(geometry, dist = dist)
    ) %>%
  dplyr::mutate(
    lag_buffer = dplyr::lag(buffer, order_by = id) %>%
      sf::st_sfc()
    )

ggplot() +
  geom_sf(
    data = buffers_4326,
    aes(geometry = buffer)
    )

# Works, in 4326
track_4326 <- buffers_4326 %>%
  dplyr::mutate(
    track = sf::st_union(buffer, lag_buffer)
    )

# Doesn't work, in 5070
buffers_5070 <- data_4326 %>%
  sf::st_transform(crs = "EPSG:5070") %>%
  dplyr::mutate(
    buffer = sf::st_buffer(geometry, dist = dist)
    ) %>%
  dplyr::mutate(
    lag_buffer = dplyr::lag(buffer, order_by = id) %>%
      sf::st_sfc()
    )

ggplot() +
  geom_sf(
    data = buffers_5070, aes(geometry = buffer)
    )

track_5070 <- buffers_5070 %>%
  dplyr::mutate(
    track = sf::st_union(buffer, lag_buffer)
    )

## Works when rowwise is added
track_5070 <- buffers_5070 %>%
  rowwise() %>%
  dplyr::mutate(
    track = sf::st_union(buffer, lag_buffer)
  )
R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044) Matrix products: default locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 [3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C [5] LC_TIME=English_United States.utf8 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] sf_1.0-9 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1 readr_2.1.4 [8] tidyr_1.3.0 tibble_3.1.8 ggplot2_3.4.1 tidyverse_2.0.0 loaded via a namespace (and not attached): [1] Rcpp_1.0.10 pillar_1.8.1 compiler_4.2.2 class_7.3-20 tools_4.2.2 [6] timechange_0.2.0 lifecycle_1.0.3 gtable_0.3.1 pkgconfig_2.0.3 rlang_1.0.6 [11] DBI_1.1.3 cli_3.6.0 rstudioapi_0.14 e1071_1.7-13 s2_1.1.2 [16] withr_2.5.0 generics_0.1.3 vctrs_0.5.2 hms_1.1.2 classInt_0.4-9 [21] grid_4.2.2 tidyselect_1.2.0 glue_1.6.2 R6_2.5.1 fansi_1.0.4 [26] farver_2.1.1 tzdb_0.3.0 magrittr_2.0.3 ellipsis_0.3.2 scales_1.2.1 [31] units_0.8-1 colorspace_2.1-0 renv_0.15.5 utf8_1.2.3 KernSmooth_2.23-20 [36] stringi_1.7.12 proxy_0.4-27 wk_0.7.1 munsell_0.5.0 GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ "3.9.3" "3.5.2" "8.2.1" "true" "true" "8.2.1"
kadyb commented 1 year ago

{sf} uses two different engines for calculations, GEOS for planar geometry (e.g. EPSG:5070) and S2 for spherical geometry (e.g. EPSG:4326), hence different results may come. I also checked the GEOS bindings and it seems the results here are more what you expect? But this also probably works like rowwise().

geos::geos_union(buffers_5070$buffer, buffers_5070$lag_buffer)
#> <geos_geometry[4] with CRS=NAD83 / Conus Albers>
#> [1] <POLYGON [1447169 763770...1467169 783770]>      
#> [2] <MULTIPOLYGON [1334483 763770...1467169 878587]> 
#> [3] <MULTIPOLYGON [1223922 858587...1354483 974995]> 
#> [4] <MULTIPOLYGON [1115513 954995...1243922 1072936]>
k6adams commented 1 year ago

{sf} uses two different engines for calculations, GEOS for planar geometry (e.g. EPSG:5070) and S2 for spherical geometry (e.g. EPSG:4326), hence different results may come. I also checked the GEOS bindings and it seems the results here are more what you expect? But this also probably works like rowwise().

geos::geos_union(buffers_5070$buffer, buffers_5070$lag_buffer)
#> <geos_geometry[4] with CRS=NAD83 / Conus Albers>
#> [1] <POLYGON [1447169 763770...1467169 783770]>      
#> [2] <MULTIPOLYGON [1334483 763770...1467169 878587]> 
#> [3] <MULTIPOLYGON [1223922 858587...1354483 974995]> 
#> [4] <MULTIPOLYGON [1115513 954995...1243922 1072936]>

I have a vague understanding that 5070 uses GEOS, and 4326 uses s2. But yeah, I am expecting the output you are getting from geo_union() to be produced by st_union().

edzer commented 1 year ago

See https://r-spatial.org/r/2020/06/17/s2.html

k6adams commented 1 year ago

@edzer should this be closed? Is the expectation that end user should know that st_union() produces a data product of different dimensions depending on the underlying geo engine? If that is the case, some indication of that in the function documentation would be a great addition. just hoping to spare people time/grief/confusion. I recognize that you likely leave your hands full and this would not be high priority.

rsbivand commented 1 year ago

Yes, it is necessary that users do grasp the difference between planar anf spherical geometries. Please provide a PR if you can improve the documentation. Using geos will use a different, usually older, GEOS, and avoid checks trapping misuse of planar predicates and operations on spherical geometries.

k6adams commented 1 year ago

@edzer forgive me, I am relatively ignorant when it comes to spatial and GitHub. Two questions…

What is PR?

Also, I understand that the user should have a know the difference between vs spherical and planar geometries. But I am failing to comprehend how the dimensionality of the sf output produced by the st_union function is related.

Should I have known that the union of the planar geometries would result in a cross product change in the data’s dimensionality?, and that the union of the spherical representations would result in no change in the structure/dimensions of the data?

edzer commented 1 year ago

I closed it because it didn't point to something that is not already widely known: that doing geometrical operation on the plane or on the sphere are very different activities, and are provided by software libraries interfaced by sf that have a very different history (GEOS and s2geometry). It would have been helpful if someone had ironed out all the differences between these two libraries, but that hasn't happened.

Also, your example involves a lot of different commands, the section marked with # Doesn't work, in 5070 works for me, it uses lag and rowwise which I've never used and so don't understand how they relate to your problem. To get help, reduce your problem to a minimal reproducible example that doesn't use tidyverse but just those sf commands that illustrate your problem.