r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.34k stars 297 forks source link

`st_intersection` return length changes depending on EPSG in sf 1.0-0 #1691

Closed christopherkenny closed 3 years ago

christopherkenny commented 3 years ago

Hello. After updating to sf 1.0-0 today, I'm noticing a change that I wasn't expecting. I assume this has to do with the use of s2. It appears that the length of an intersection now depends on the projection. Using 4269, the output is shorter than using 3857. I've included a toy example below:

library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

sfc  <- st_sfc(st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0)))))
grid <- st_as_sf(st_make_grid(sfc, n = 4))

box <- st_as_sf(st_sfc(st_polygon(list(rbind(c(.25,.25), c(.75,.25), c(.75,.75), c(.25,.75), c(.25, .25))))))

st_crs(grid) <- 4269
st_crs(box) <- 4269

ints <- sf::st_intersection(x = grid, y = st_union(box))
nrow(ints)
#> [1] 11

st_crs(grid) <- 3857
#> Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
#> that
st_crs(box) <- 3857
#> Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
#> that

ints <- sf::st_intersection(x = grid, y = st_union(box))
nrow(ints)
#> [1] 16

Created on 2021-06-09 by the reprex package (v2.0.0)

Rerunning this toy example on sf 0.9.8, both have length 16.

If this is intended or I missed a new argument that I should be using, I apologize. Any thoughts would be helpful.

edzer commented 3 years ago

Rerunning this toy example on sf 0.9.8, both have length 16.

Yes, because they formerly took exactly identical code routes.

With s2, the first toy example considers a "square" area on the sphere, with ribs that are great cirlce arcs with length of one quarter degree (around 27 km), your second toy example concerns a 1 m x 1 m square area on a flat plane.

You can get a little bit closer by setting the s2_model to "closed":

library(sf)
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
sfc  <- st_sfc(st_polygon(list(rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0)))))
grid_ll <- st_as_sf(st_make_grid(sfc, n = 4))
box_ll <- st_sfc(st_polygon(list(rbind(c(.25,.25), c(.5, .25), c(.75,.25), c(.75, .5), c(.75,.75), c(.5, .75), c(.25,.75), c(.25, .5), c(.25, .25)))))
# added all half-points!

st_crs(grid_ll) <- 4269
st_crs(box_ll) <- 4269

ints <- sf::st_intersection(x = grid_ll, y = st_union(box_ll), s2_model = "closed")
nrow(ints)
# [1] 14

print(st_geometry(ints), n = nrow(ints))
# Geometry set for 14 features 
# Geometry type: GEOMETRY
# Dimension:     XY
# Bounding box:  xmin: 0.25 ymin: 0.25 xmax: 0.75 ymax: 0.75
# Geodetic CRS:  NAD83
# POINT (0.25 0.25)
# LINESTRING (0.5 0.25, 0.25 0.25, 0.5 0.25)
# LINESTRING (0.75 0.25, 0.5 0.25, 0.75 0.25)
# MULTILINESTRING ((0.75 0.25, 0.75 0.25, 0.75 0....
# POINT (0.25 0.25)
# POLYGON ((0.25 0.25, 0.5 0.25, 0.5 0.5, 0.25 0....
# POLYGON ((0.5 0.25, 0.75 0.25, 0.75 0.5, 0.5 0....
# POLYGON ((0.75 0.5, 0.75 0.5, 0.75 0.25, 0.75 0...
# POLYGON ((0.25 0.5, 0.5 0.5, 0.5 0.75, 0.25 0.7...
# POLYGON ((0.5 0.5, 0.75 0.5, 0.75 0.75, 0.5 0.7...
# POLYGON ((0.75 0.75, 0.75 0.5, 0.75 0.5, 0.75 0...
# GEOMETRYCOLLECTION (MULTILINESTRING ((0.25 0.75...
# POLYGON ((0.5 0.75, 0.5 0.75, 0.75 0.75, 0.5 0....
# GEOMETRYCOLLECTION (LINESTRING (0.75 0.75, 0.75...
st_geometry(ints)[[12]]
# GEOMETRYCOLLECTION (MULTILINESTRING ((0.25 0.75, 0.25 0.75), (0.25 0.75, 0.5 0.75, 0.25 0.75), (0.25 0.75, 0.25 0.75), (0.25 0.75, 0.25 0.75, 0.25 0.75)), POLYGON ((0.5 0.75, 0.5 0.75, 0.5 0.75, 0.5 0.75)))
st_geometry(ints)[[14]]
# GEOMETRYCOLLECTION (LINESTRING (0.75 0.75, 0.75 0.75, 0.75 0.75), POLYGON ((0.75 0.75, 0.75 0.75, 0.75 0.75, 0.75 0.75)))

where you see quite a few funny geometries being returned. We didn't make `"closed" the default because it seems to be less robust, possibly less tested or used, and maybe not a good idea at all.

If you search the geos-devel mailing list for "OverlayNG", you'll find that even in that arena there is no unity on what intersection should return (all geometries in a collection, or only the one with the highest dimension). Aalthough, admittedly, there's probably unity about this particular case.

If this is problematic, it would be helpful if you could come up with a real data example.

christopherkenny commented 3 years ago

Thank you for the response. I will look into the geos-devel discussion to get an idea of what's going on.

My typical use case for looking at the intersections has been to figure out the geographic relationship between different sources of administrative data. I've attached a simple case of this with block group and block data from US Census data, in ex_data.zip.

library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
blocks <- readRDS('blocks.Rds')
block_group <- readRDS('block_group.Rds')

ints <- sf::st_intersection(blocks, block_group)
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
nrow(ints)
#> [1] 30

blocks2 <- st_transform(blocks, 3857)
block_group2 <- st_transform(block_group, 3857)

ints2 <- sf::st_intersection(blocks2, block_group2)
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
nrow(ints2)
#> [1] 38

Created on 2021-06-10 by the reprex package (v2.0.0)

One variant of where I've used this approach is the geo_trim() function here, where I figure out the intersections and then subset based on the percent area overlap.

I admit, this might be a case where I'm just asking the wrong question. I noticed the change quickly because it is a key step in a simple solution to a common problem in redistricting work, but it may just be the wrong solution.

edzer commented 3 years ago

The differences in your toy examples are about polygons touching along lines or in a point. If your interest is in overlaps, and they're not huge and don't contain very long lines where the particular interpretation of "straight" matters (as in https://github.com/obrl-soil/slga/issues/6) then I wouldn't worry too much.

In your real example above, both ints and ints2 have 30 intersecting polygons.

christopherkenny commented 3 years ago

Thank you -- that's a good point. With that in mind, since I'm interested in the areas after, I can use the row names to do something along the lines of:

  poly <- attr(blocks, 'row.names') %in% attr(ints, 'row.names')
  areaints <- rep(0, nrow(blocks))
  areaints[poly] <- st_area(st_make_valid(ints, NA_on_exception = TRUE))

Thanks again!

(Originally said NA_on_error, but the argument is NA_on_exception)

edzer commented 3 years ago

Yes, it may be cheaper to use st_dimension() on the intersection geometries to filter out the polygons/multipolygons.