rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
543 stars 90 forks source link

Exact equivalent to sf::st_intersection using terra #1608

Closed gibbona1 closed 1 month ago

gibbona1 commented 1 month ago

If I have two sf objects (polygons and multipolygons), df1 and df2, which cover the same area but at different times. I want to get the intersection of the two maps.

I currently use

df_int <- st_intersection(df1, df2) which is correct but the result is pretty slow for large sf objects. The below method is significanlty faster (10-20 times) but sometimes has different results. any idea why this might be the case? They have the same CRS.

terra1 <- vect(df1)
terra2 <- vect(df2)

df_int <- intersect(terra2, terra1) %>% st_as_sf()

I've also tried intersect(terra1, terra2) which can be different again.

How do I get the speed benefits of terra but have basically identical output to the sf version?

kadyb commented 1 month ago

If your data is in geographic coordinate system, the s2 engine will be used for calculations on ellipsoid in {sf} ({terra} usually uses the GEOS engine for planar calculations), which may be the reason for the difference (check the sf::sf_use_s2(FALSE) option for comparison of results).

If you want to speed up sf::st_intersection(), maybe this thread will helpful: https://github.com/r-spatial/sf/issues/801

gibbona1 commented 1 month ago

Thanks for the quick answer @kadyb! Seems to work well. I had sf::sf_use_s2(TRUE) by default in my shiny app so I'll switch it off for better comparison.

gibbona1 commented 1 month ago

Apologies for reopening but there still seems to be an issue in some spatial datasets. switching s2 to off by default worked for intersecting one pair of datasets and giving very close results, but I have an example where the results are very different between sf and terra.

Data is available in an external repo related to my shiny app here.

I want to intersect the two datasets and have each row have a polygon with the columns CODE_12, the code of the polygon from sf1 (in 2012), and CODE_18, the same for sf2 (2018). terra::intersect does this operation considerably faster (2-20 times from what I've seen) than sf::st_intersection and so I want to implement it as an option as I have heard from some users that speed is a drawback of the app for larger datasets.

Despite this, the results are extremely different. The intersection gives a different number of rows (which could be okay), but the CODE_12 column of df_int only has two values, 111 and 112, whereas df_int2 is as expected. CODE_18 produces differences also, though not as drastic.

Is there any reason why this could be? Is it the data, my code, or terra itself? @kadyb any intuition on what might be going on?

library(sf)
library(terra)
library(dplyr)

sf_folder <- "map_data/Dargle"
f1 <- "dargle_CLC2012.shp"
f2 <- "dargle_CLC2018.shp"

sf_use_s2(FALSE)

#read data
sf1 <- st_read(file.path(sf_folder, f1)) %>% st_make_valid()
sf2 <- st_read(file.path(sf_folder, f2)) %>% st_make_valid()

terra1 <- vect(sf1)
terra2 <- vect(sf2)

df_int <- terra::intersect(terra1, terra2) %>% st_as_sf()

df_int %>%
  arrange(CODE_12) %>%
  select(geometry, CODE_12) %>%
  plot()

df_int2 <- st_intersection(sf1, sf2)

df_int2 %>%
  arrange(CODE_12) %>%
  select(geometry, CODE_12) %>%
  plot()

using dplyr_1.1.4, terra_1.7-55, sf_1.0-14.

Expected output - plotting df_int2 (using just sf):

plot_zoom_png plot_zoom_png

Observed output - plotting df_int (using terra and converting back to sf):

plot_zoom_png plot_zoom_png

kadyb commented 1 month ago

Try updating {terra} to the latest version. Using your code in df_int$CODE_12 column from {terra} I have 21 categories, like in {sf}.

df_int %>%
  arrange(CODE_12) %>%
  select(geometry, CODE_12) %>%
  plot()
unique(df_int$CODE_12)
#>  [1] "111" "112" "121" "122" "131" "133" "141" "142" "211" "231" "242" "243" "311" "312"
#> [15] "313" "322" "324" "333" "412" "512" "523"

BTW: In this line: df_int2 <- st_intersection(df1, df2), objects should rather be sf1 and sf2?

gibbona1 commented 1 month ago

Yes apologies, updated df1 and df2 to sf1 and sf2. Can confirm that updateing to terra_1.7-78 and sf_1.0-17 has solved the issue Thanks again!