r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.34k stars 298 forks source link

st_buffer doesn't work properly for lat-long coordinates and small buffers #1692

Open barryrowlingson opened 3 years ago

barryrowlingson commented 3 years ago

example: make a one-degree unit square, in epsg 4326 and equirectangular

> library(sf)
Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
> m = cbind(c(0,1,1,0,0),c(0,0,1,1,0))
> p1 = st_sfc(st_polygon(list(m)), crs=4326)
> p1e = st_transform(p1, "+proj=eqc")

Now buffer the equirectangular by a generous chunk of its width and plot the results:

> p1eb = st_buffer(p1e, dist=8000)
> plot(p1eb)
> axis(1)
> plot(p1e, add=TRUE)

image

Looks good. Now try with the lat-long square, using a 0.1 buffer distance. I first thought this was meant to be degrees which is why I tried this. I'm sure this is metres now. Anyway it gets me this:

> p1b = st_buffer(p1, dist=0.1)
> plot(p1b)
> axis(1)

image

It seems to have extended slightly to the S and W except for a notch in the SW corner (at Null Island...)

Creating a zero-distance buffer shows a bit more glitchyness:

> p1b = st_buffer(p1, dist=0)
> plot(p1b, lwd=3, border="blue")
> plot(p1,border="red", add=TRUE)

image

Even large buffers (which I guess are in metres?) produce glitchy results in places:

> p1b10k = st_buffer(p1, dist=10000)
> plot(p1b10k, lwd=3)

the corners are quite steppy all round, and there's two nicks taken out of the northern border. I hope this doesn't start a war:

image

Buffering the equirectangular square by 10km and overlaying with the buffered 4326 square (transformed to equirectangular) shows the steppiness of the 4326 buffer compared to the smooth equirectangular buffer:

> p1e10k = st_buffer(p1e, dist=10000)
> plot(st_transform(p1b10k,st_crs(p1e10k)), lwd=1,add=TRUE,border="red")
> axis(1)

image

I'd hazard a guess that this is some tolerance when buffering lat-long coords, and possibly related to the s2 spherical geometry changes? I'd also hazard a guess that you've seen this and already fixed it :)

Session Info:

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5         class_7.3-17       crayon_1.3.4       dplyr_1.0.2       
 [5] wk_0.4.1           grid_4.0.3         R6_2.5.0           lifecycle_1.0.0   
 [9] DBI_1.1.0          magrittr_2.0.1     e1071_1.7-4        units_0.6-7       
[13] pillar_1.4.7       KernSmooth_2.23-17 rlang_0.4.10       ellipsis_0.3.1    
[17] vctrs_0.3.6        generics_0.1.0     s2_1.0.5           tools_4.0.3       
[21] glue_1.4.2         purrr_0.3.4        compiler_4.0.3     pkgconfig_2.0.3   
[25] classInt_0.4-3     tidyselect_1.1.0   tibble_3.0.4      
edzer commented 3 years ago

No, it's not fixed; see https://r-spatial.github.io/sf/articles/sf7.html#buffers-1 for a description.

Your example is on the equator, but with realistic examples anything further away will have direction-dependent buffers when treating ellipsoidal coordinates as Cartesian (the GEOS way). So it's not good what we have now, but it's not good what we had either.

edzer commented 3 years ago

Here is what I mean:

library(sf)
# Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1

(pt = st_sfc(st_point(c(7, 52)), crs = 'OGC:CRS84'))
# Geometry set for 1 feature 
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: 7 ymin: 52 xmax: 7 ymax: 52
# Geodetic CRS:  WGS 84
# POINT (7 52)

sf_use_s2(TRUE)
plot(st_buffer(pt, units::set_units(1, degree)), axes = TRUE, main = "s2")
plot(pt, add = TRUE)

x1

sf_use_s2(FALSE)
# Spherical geometry (s2) switched off
plot(st_buffer(pt, units::set_units(1, degree)), axes = TRUE, main = "GEOS")
# Warning message:
# In st_buffer.sfc(pt, units::set_units(1, degree)) :
#   st_buffer does not correctly buffer longitude/latitude data
plot(pt, add = TRUE)

x2

The first one is ragged, the second one has distances wrong. Picking a sensible default for the number of cells in the first case is still an open problem; narrow buffers around lines e.g. will need a lot.

The cell-based buffer from s2 always contains the entire "true" (smooth) buffer shape, and can be used to pre-select features, using distance calculations on them afterwards.

dblodgett-usgs commented 2 years ago

I've used a 0 buffer to clean polygons over the years. This issue of a very small buffer adding noise to polygon edges throws a wrench in that. I guess the buffer-as-cleaning hack should go by the way side, but how else should I handle removing duplicate geometry nodes on polygons?

rsbivand commented 2 years ago

@dblodgett-usgs What is sf_use_s2()? For planar geometries, it should work, but for spherical geometries may not, as I think your ndhplus issue indicates. Also the st_make_valid() function for planar geometries only is probably more robust than zero-buffering for recent GEOS versions.

dblodgett-usgs commented 2 years ago

sf::sf_use_s2(FALSE) forces sf to use geos rather than s2 where applicable, by my understanding. In my package code, where use a 0 buffer to clean up potentially problematic geometry, I can just use that to avoid the issue discussed above.

It's good to know that st_make_valid() is going to work better. I remember needing this to get geometries that would play nice with ArcGIS's geometry validation rules -- that has always been a bit of a dark art in my experience as things that are valid in GEOS or other tools like PostGIS or the Java Topology Suite can still be invalid in Arc.

edzer commented 2 years ago

Does Arc follow some kind of open standard about how it defines valid?

dblodgett-usgs commented 2 years ago

There's a lot going on there. Maybe some documentation has come out recently, but when I was working on a subsetter, I was unable to find anything definitive. I went through a whole process to create this: https://github.com/USGS-R/nhdplusTools/blob/master/R/subset_nhdplus.R#L604

Where I would load something into a geopackage and try to open it in arcpro. One of the major things was duplicated nodes -- Arc would just bomb without telling me anything about why but removing the duplicate nodes solved the issue.