r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.34k stars 298 forks source link

st_crop does not work correctly with bounding box with geographic coordinates. #2443

Closed sostberg closed 1 month ago

sostberg commented 1 month ago

Describe the bug I'm trying to cut out a subset of features from a simple feature collection using a bounding box. The simple feature collection uses geographic lon/lat coordinates.

The resulting simple feature collection does have a larger bounding box than would I supplied to st_crop. More importantly, it does not contain all the features from the original sf object that fall within the bounding box.

st_crop seems to work correctly if I switch off s2 functionality (sf_use_s2(FALSE)). Is this the only possible fix? Should s2 functionality be switched off in general if working with lon/lat data?

``` > sessionInfo() R version 4.4.1 (2024-06-14) Platform: x86_64-pc-linux-gnu Running under: Linux Mint 21.3 Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C time zone: Europe/Berlin tzcode source: system (glibc) attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] maps_3.4.2 doParallel_1.0.17 iterators_1.0.14 units_0.8-5 [5] stringi_1.8.4 foreach_1.5.2 rgdal_1.6-7 lwgeom_0.2-14 [9] raster_3.6-26 sp_2.1-4 sf_1.0-16 loaded via a namespace (and not attached): [1] s2_1.1.6 codetools_0.2-19 lattice_0.22-5 e1071_1.7-14 [5] magrittr_2.0.3 KernSmooth_2.23-24 wk_0.9.1 classInt_0.4-10 [9] terra_1.7-78 grid_4.4.1 DBI_1.2.3 proxy_0.4-27 [13] class_7.3-22 compiler_4.4.1 tools_4.4.1 Rcpp_1.0.12 ``` ``` > sf::sf_extSoftVersion() GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H "3.10.2" "3.4.1" "8.2.1" "true" "true" PROJ "8.2.1" ```
rsbivand commented 1 month ago

Please provide a reproducible example, and note that bounding boxes are not rectangular in geographical coordinates.

sostberg commented 1 month ago

Here is a working example:

library(raster)
library(sf)
# Create global raster at 0.5° spatial resolution
gridraster <- raster(ncols = 720, nrows = 360)
crs(gridraster) <- "+proj=longlat +datum=WGS84 +no_defs"
# Set values to index
gridraster[] <- seq_len(ncell(gridraster))
# Convert raster to polygons
gridshape <- rasterToPolygons(gridraster)
# Convert to sf object
gridshape <- st_as_sf(gridshape)
# Set up smaller bounding box
subset_bbox <- c(xmin = -175.1, ymin = -18.6, xmax = 178.1, ymax = 6.1)
# Crop gridshape to smaller bounding box with sf_use_s2(TRUE)
sf_use_s2(TRUE)
gridshape_subset1 <- st_crop(gridshape, subset_bbox)
# results in feature collection with 714 features

# Crop gridshape to smaller bounding box with sf_use_s2(FALSE)
sf_use_s2(FALSE)
gridshape_subset2 <- st_crop(gridshape, subset_bbox)
# results in feature collection with 36108 features

I would expect the result I get with sf_use_s2(FALSE)

rsbivand commented 1 month ago

Why use raster? terra is preferred, but the problem of treating the sphere as planar remains. Neither raster nor terra use spherical concepts, so turning off s2 uses their logic. sf prefers s2 logic for geographical coordinates. So rather use terra if you prefer its logic?

library(terra)
r <- rast(ncol=720, nrow=360, xmin=-180, xmax=180, ymin=-90, ymax=90)
e <- ext(-175.1, xmax=178.1, ymin=-18.6, ymax=6.1)
res <- crop(r, e)
out <- sf::st_as_sf(as.polygons(res))
out
sostberg commented 1 month ago

My original code is much more extensive. It uses a lot of raster-based code. Also, the cropping is done multiple times within a loop, so doing the cropping on the raster and then converting the cropped raster to polygons isn't an option.

Note: I do plan to switch my project from raster to terra but that will be a big effort.

rsbivand commented 1 month ago

Well, if your workflow depended on pre-s2 behaviour, then you'll have to turn off s2. s2 has been on by default for over three years. Please also see that you can set a pre-session option to treat all geometries as planar even if they are spherical: https://r-spatial.github.io/sf/news/index.html, in 1.0.9: use the global options("sf_use_s2") to determine whether to use s2, in 1.0.0: use s2 spherical geometry as default when coordinates are ellipsoidal. This can be switched off (defaulting to planar geometry, using GEOS, as in sf < 1.0-0) by setting environment variable _SF_USE_S2 to false before package sf is loaded, or by sf_use_s2(FALSE) - note that the environment variable _SF_USE_S2 can also be used. st_crop has been behaving as you now report since sf 1.0.0 in 2021. Please also refer to: https://r-spatial.org/book/04-Spherical.html for the reasoning behind the default choices in sf.

sostberg commented 1 month ago

I'm also using st_intersection between the gridshape or gridshape_subset and other polygons with lon/lat coordinates. Would you suggest to turn off s2 behaviour only for st_crop or also for st_intersection? I have noticed that the result of st_intersection can change slightly with or without s2 behaviour but I do not have an expectation of which version is right or wrong for st_intersection.

rsbivand commented 1 month ago

Using planar computation on spherical geometries is always "wrong", but has been accepted as computing on the sphere was seen as too difficult (see the quote we used as the motto in our chapter). The differences stem from treating lines between two points on the surface of a sphere as "straight" when they most often are curves, hence S2 which is bounded and wraps around, as contrasted with R2, where the coordinates are on two real dimensions from -Inf to +Inf. st_intersection differences may then stem from s2 taking account of this curvature.

sostberg commented 1 month ago

Thank you. So my take-away is that the behaviour of st_crop with s2 switched on is not a bug. And generally, it is recommended to use s2 for spherical geometries.

rsbivand commented 1 month ago

Yes, especially for workflows involving vector geometries. Rasters are not so obvious, because planar rasters have equal area cells across the whole study area, which is not the case on the sphere in general. So some analysts may prefer approaches through for example dggridR https://cran.r-project.org/package=dggridR or similar, providing equal area cells on the sphere/ellipsoid. However, a lot of workflows have just gone ahead and treated apparent rectangles on the sphere as such, to avoid the problems arising, and because possible ways round have not really been adopted broadly.

Nowosad commented 1 month ago

@sostberg I think that reading https://r-spatial.org/book/04-Spherical.html may help you to grasp the general idea

edzer commented 1 month ago

And generally, it is recommended to use s2 for spherical geometries.

Switching s2 off means you're interpreting geodetic coordinates as Cartesian coordinates. For some operations this may be OK, for others not; sf gives you warnings that it does this either way. terra does its best and has many spherical operations (even buffer), but is not very informative when it takes the Cartesian path, see e.g. https://github.com/rspatial/terra/issues/1576