rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
536 stars 89 forks source link

Memory issue with large SpatVector #888

Open DavidDHofmann opened 1 year ago

DavidDHofmann commented 1 year ago

Hello. I am working with a large matrix that contains 160 Mio. rows of xy-coordinates that I want to reproject. This used to work perfectly fine in previous versions of terra but now I'm running into memory issues. Already creating the SpatVector results in 43 gb of memory being occupied. Once I do the reprojection, my memory runs out entirely (I have 64 gb of RAM). Do you have any suggestions of why this is happening? Below is an example of what I'm trying to achieve.

# Load required packages
library(terra)

# Simulate some coordinates with 160 Mio. rows
xy <- cbind(
    x = runif(160e6, 22, 26)
  , y = runif(160e6, -22, -17)
)

# Create SpatVect
xy <- vect(xy, crs = "epsg:4326", type = "points")

# Reproject coordinates -> Here I run out of memory
xy <- project(xy, "epsg:32734")

Here's my sessionInfo()

R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19.3

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=de_CH.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_CH.UTF-8        LC_COLLATE=de_CH.UTF-8    
 [5] LC_MONETARY=de_CH.UTF-8    LC_MESSAGES=de_CH.UTF-8   
 [7] LC_PAPER=de_CH.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] terra_1.6-40

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9       codetools_0.2-18 fansi_1.0.3      crayon_1.5.2    
 [5] digest_0.6.30    utf8_1.2.2       IRdisplay_1.1    repr_1.1.4      
 [9] lifecycle_1.0.3  jsonlite_1.8.3   evaluate_0.17    pillar_1.8.1    
[13] rlang_1.0.6      cli_3.4.1        uuid_1.1-0       vctrs_0.5.0     
[17] IRkernel_1.3     tools_4.2.2      glue_1.6.2       fastmap_1.1.0   
[21] compiler_4.2.2   base64enc_0.1-3  pbdZMQ_0.3-7     htmltools_0.5.3 
rhijmans commented 1 year ago

Thank you. I need to redesign SpatVector a bit to better accommodate very many points. For now, here is a work-around for your specific use case (you do not need to create a SpatVector). I am guessing that this is what you may have done before, because I do not see a difference in memory needs when comparing with older versions.

library(terra)
xy <- cbind(
    x = runif(16, 22, 26)
  , y = runif(16, -22, -17)
)

xy <- project(xy, "epsg:4326", "epsg:32734")
kadyb commented 1 year ago

@rhijmans, does your example only use PROJ (without GDAL)? Like sf::sf_project()?

rhijmans commented 1 year ago

@kadyb no, this goes through OGR/GDAL

DavidDHofmann commented 1 year ago

Great, this works well indeed. I never modified my old code, so I'm still a bit unsure what caused the sudden memory overflow. But anyhow, it works now. Thanks!

strevisani commented 1 year ago

Hi,

in regard to memory issues, I had similar problem, but with raster data, using extensively focal() function with relatively large rasters, having a message: "std::bad_alloc". The interesting thing is that I tried the same routine with the same data in 3 different PCs with windows (one very old: intel core duo, ram 4 Gb, windows 8; intel I5 6th generation, ram 8 Gb, windows 10; and intel I7 8th generation 16 Gb ram, windows 11).

Surprisingly, I had the issue only in the one with 16 Gb of ram; the others computed things without any issue. Changing terraOptions() "memfrac" option from 0.6 to 0.1, resolved the issue. I know that vectors are managed differently than rasters, but maybe this information can be useful.

Sebastiano