r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.35k stars 299 forks source link

R version 4.2.3 closes unexpectedly using st_intersection #2417

Closed clozan2 closed 2 months ago

clozan2 commented 3 months ago

I am not sure if this is a bug issue so am posting under this topic category.

I'm working with fairly large geospatial data sets and have previously solved some of my issues after reading a thread on a related issue (Memory issue using st_intersection #394). Using advice from this thread, I was able to reduce the processing time of my code from days to under 3 hours! However, I'm now running into a bigger problem. I've updated my analyses (code copied below), and R closes each time I try to run st_intersection for certain geospatial objects, but not all. I suspect that the issue is due to memory limitations, but I've tried this on two machines, one with much higher processing capabilities. The last line in the code below is the one that causes R to close. Note, I've left in some lines of code that I was trying to use to fix potential memory issues (e.g., "rm(...)"). Any ideas would be greatly appreciated!

I'm using Windows 11 PRO R version 4.2.3 (2023-03-15 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22631)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 LC_NUMERIC=C LC_TIME=English_United States.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] paletteer_1.6.0 stars_0.6-1 abind_1.4-5 terra_1.7-23 rgdal_1.6-7 sp_1.6-0 sf_1.0-13 FedData_4.0.1 tigris_2.1 leaflet_2.1.2

loaded via a namespace (and not attached): [1] Rcpp_1.0.10 pillar_1.9.0 compiler_4.2.3 class_7.3-21 tools_4.2.3 digest_0.6.35 uuid_1.1-0 lattice_0.20-45 lifecycle_1.0.4 tibble_3.2.1 pkgconfig_2.0.3 rlang_1.1.3
[13] DBI_1.1.3 cli_3.6.2 crosstalk_1.2.0 curl_5.2.1 parallel_4.2.3 fastmap_1.1.1 e1071_1.7-13 s2_1.1.6 xml2_1.3.3 dplyr_1.1.4 httr_1.4.5 stringr_1.5.1
[25] generics_0.1.3 vctrs_0.6.5 htmlwidgets_1.6.4 rappdirs_0.3.3 hms_1.1.3 classInt_0.4-9 grid_4.2.3 tidyselect_1.2.1 glue_1.7.0 R6_2.5.1 fansi_1.0.6 rematch2_2.1.2
[37] readr_2.1.4 tzdb_0.3.0 magrittr_2.0.3 codetools_0.2-19 htmltools_0.5.8.1 units_0.8-1 utf8_1.2.4 KernSmooth_2.23-20 proxy_0.4-27 stringi_1.8.3 wk_0.7.2 lwgeom_0.2-13

####################################################################################################

library(tigris)
options(tigris_use_cache=TRUE)
library(FedData)
library(sf)
'%notin%'<-function(x,y)!('%in%'(x,y))

onondaga20<-block_groups(state='NY',county='Onondaga',year=2020)
cayuga20<-block_groups(state='NY',county='Cayuga',year=2020)
cortland20<-block_groups(state='NY',county='Cortland',year=2020)
madison20<-block_groups(state='NY',county='Madison',year=2020)
oswego20<-block_groups(state='NY',county='Oswego',year=2020)

studyarea<-rbind(cayuga20,cortland20,madison20,oswego20,onondaga20)

###Download and process NLCD data for the study area
sa.nlcd01<-get_nlcd(template=studyarea,label='SA',year=2001,dataset='landcover',landmass='L48') ###Note: A runtime error sometimes occurs, possibly due to internet speed. Rerun if needed.
sa.nlcd21<-get_nlcd(template=studyarea,label='SA',year=2021,dataset='landcover',landmass='L48')

sa.nlcd01b<-terra::as.polygons(sa.nlcd01)
sa.nlcd21b<-terra::as.polygons(sa.nlcd21)

sa.nlcd01c<-sf::st_as_sf(sa.nlcd01b)
sa.nlcd21c<-sf::st_as_sf(sa.nlcd21b)

sa.nlcd01d<-st_transform(sa.nlcd01c,crs=4269)
sa.nlcd21d<-st_transform(sa.nlcd21c,crs=4269)

studyareab<-st_union(studyarea)

sf_use_s2(FALSE)
sa.nlcd01e<-sf::st_intersection(sa.nlcd01d,studyareab)
sa.nlcd21e<-sf::st_intersection(sa.nlcd21d,studyareab)

###Create unified spatial polygons for each of the geographies
onondaga20b<-st_union(onondaga20)

###Conduct analyses on bat habitat increases
other21<-st_union(sa.nlcd21e[sa.nlcd21e$Class %notin% c('Deciduous Forest','Evergreen Forest','Mixed Forest','Woody Wetlands','Developed High Intensity','Developed, Low Intensity','"Developed, Medium Intensity'),])
forest01<-st_union(sa.nlcd01e[sa.nlcd01e$Class %in% c('Deciduous Forest','Evergreen Forest','Mixed Forest','Woody Wetlands'),])
loss0121<-sf::st_intersection(forest01,other21)
loss0121$area<-st_area(loss0121)

##Separate by geographies
sf_use_s2(FALSE)
onondaga_l<-st_intersection(loss0121,onondaga20b)
edzer commented 3 months ago

Please try to minimize the script and the number of packages needed. Then, provide a reproducible example.

clozan2 commented 3 months ago

Please try to minimize the script and the number of packages needed. Then, provide a reproducible example.

Thanks for the quick response. I have provided a minimized version of the script in the edited post. I hope it has been reduced enough. This version still produces the issue.

edzer commented 3 months ago

The script finishes for me, but used quite a bit of memory ( > 7 Gb).

rsbivand commented 3 months ago

@edzer I believe you were not using R 4.2 on Windows, then with sf built against an older GEOS and GDAL? I agree that this looks like a Windows memory question, but R 4.2 is no longer supported. Could @clozan2 upgrade to R 4.4, install the current sf binary built against newer GEOS and GDAL, and run the workflow in that setting, monitoring memory usage? It is important not to run under any IDE, either in Rgui.exe or in a Windows terminal/console, to reduce extraneous memory usage.

clozan2 commented 3 months ago

@rsbivand Thank you the response! I will upgrade to R 4.4 and try.

@edzer good to know that it finished when you tried.

rsbivand commented 3 months ago

On Win 10, R 4.4, Rgui error-exits and crashes on onondaga_l<-st_intersection(loss0121,onondaga20b). Memory use > 11GB. Will try again, saving intermediate objects.

rsbivand commented 3 months ago

And in an Rtools44 terminal:

> load("GEOS_crash.rda")
> sf_use_s2(FALSE)
Spherical geometry (s2) switched off
> onondaga_l<-st_intersection(loss0121,onondaga20b)
although coordinates are longitude/latitude, st_intersection assumes that they
are planar
Segmentation fault

The first object is an extremely complex sfc, the second a simpler sfc. This also crashes when using s2 (which should be used) rather than GEOS (which shouldn't be used for spherical coordinates). My intuition is that the workflow is highly suboptimal, and probably should stay SpatRaster all the way, with no transformation of the polygonised raster to geographical coordinates - rather project the boundaries to the CRS of the raster.

rsbivand commented 3 months ago
library(tigris)
options(tigris_use_cache=TRUE)
library(FedData)
library(sf)
'%notin%'<-function(x,y)!('%in%'(x,y))
onondaga20<-block_groups(state='NY',county='Onondaga',year=2020)
cayuga20<-block_groups(state='NY',county='Cayuga',year=2020)
cortland20<-block_groups(state='NY',county='Cortland',year=2020)
madison20<-block_groups(state='NY',county='Madison',year=2020)
oswego20<-block_groups(state='NY',county='Oswego',year=2020)
studyarea<-rbind(cayuga20,cortland20,madison20,oswego20,onondaga20)
library(terra)
###Download and process NLCD data for the study area
sa.nlcd01<-get_nlcd(template=studyarea,label='SA',year=2001,dataset='landcover',landmass='L48') ###Note: A runtime error sometimes occurs, possibly due to internet speed. Rerun if needed.
sa.nlcd21<-get_nlcd(template=studyarea,label='SA',year=2021,dataset='landcover',landmass='L48')
studyarea <- vect(st_transform(studyarea, "EPSG:5070"))
onondaga20 <- vect(st_transform(onondaga20, "EPSG:5070"))
studyareab <- aggregate(studyarea)
sa.nlcd01e <- mask(sa.nlcd01, studyareab)
sa.nlcd21e <- mask(sa.nlcd21, studyareab)
onondaga20b <- aggregate(onondaga20)
other21_0 <- sa.nlcd21e$Class %notin% c('Deciduous Forest','Evergreen Forest','Mixed Forest','Woody Wetlands','Developed High Intensity','Developed, Low Intensity','"Developed, Medium Intensity') 
forest01_0 <- sa.nlcd01e$Class %in% c('Deciduous Forest','Evergreen Forest','Mixed Forest','Woody Wetlands')
other21_1 <- mask(other21_0, onondaga20b)
forest01_1 <- mask(forest01_0, onondaga20b)
onondaga_l <- as.polygons(as.factor(forest01_1 & other21_1))
expanse(onondaga_l)
plot(onondaga_l)
mapview::mapview(onondaga_l)

I rarely use terra and remain unsure what the product is supposed to be, but added area calculation here after converting to SpatVector. This works fine on Windows 10 too.