rspatial / terra

R package for spatial data handling https://rspatial.github.io/terra/reference/terra-package.html
GNU General Public License v3.0
539 stars 89 forks source link

std::bad_alloc with disaggregate #996

Open mzarowka opened 1 year ago

mzarowka commented 1 year ago

Hi, disclaimer - I'm not actually an R programmer so maybe I'm missing something simple - probably I'm disaggregating in a wrong way?

I'm using terra to work with hyperspectral rasters of geological material (sediment cores). These are pretty big files with around 400 to almost 500 wavelengths (so number of layers) that can easily reach more than 60 GB. Terra is perfect solution, as operations with that kind of data in memory are rather hard and we are aiming at different hardware setups - much appreciation for your hard work devs!

At one point so called white reference (small white object) has to match exactly extent of captured data (long sediment core). We achieve it by first cropping it to the same width, then aggregating reference file into one mean row, and then disaggregating this mean row it to match captured data extent and cell size.

This works fine with smaller subsets and datasets of approx. 400 lyrs, but with datasets of approx. 500 lyrs it cannot allocate space. With subsets around 50-100 lyrs it works really fast.

I'm not sure how to provide more reproducible example, because failure occurs with very large datasets.

I tried using different tmp directories on SSD drives and making sure there is more than required space available, and increasing number of steps with: terra::terraOptions(steps = 1000, tempdir = "e:/tmp").

These are the custom functions that are used together:

raster_crop <- function(raster, type, roi, ref_type) {
  # If cropping entire capture SpatRaster use entire large ROI
  if (type == "capture") {
    raster <- terra::crop(raster, roi, filename = paste0(paths[["directory"]], "products/capture_cropped.tif"), overwrite = TRUE)
    # If cropping reference SpatRaster use only xmin and xmax from large ROI
  } else if (ref_type == "whiteref"){
    raster <- terra::crop(raster, c(terra::xmin(roi), terra::xmax(roi), terra::ymin(raster), terra::ymax(raster)), filename = paste0(paths[["directory"]], "products/whiteref_cropped.tif"), overwrite = TRUE)
  } else {
    raster <- terra::crop(raster, c(terra::xmin(roi), terra::xmax(roi), terra::ymin(raster), terra::ymax(raster)), filename = paste0(paths[["directory"]], "products/darkref_cropped.tif"), overwrite = TRUE)
  }

  # Return raster
  return(raster)
}
create_reference_raster <- function(raster, roi, ref_type) {
  if (ref_type == "whiteref"){
    name <- "whiteref"
  } else {
    name <- "darkref"
  }
  # Aggregate data into one row SpatRaster, "divide" by number of rows
  raster <- terra::aggregate(raster, fact = c(terra::nrow(raster), 1), fun = "mean", filename = paste0(paths[["directory"]], "products/", name, "_agg.tif"), overwrite = TRUE)

  # Set new extent to match extent of capture SpatRaster
  terra::ext(raster) <- roi

  # Disaggregate data over entire extent to mach capture SpatRaster extent, "multiply" by ymax
  raster <- terra::disagg(raster, fact = c(terra::ymax(raster), 1), filename = paste0(paths[["directory"]], "products/", name, "_disagg.tif"), overwrite = TRUE)

  # Return raster
  return(raster)
}

Chained together

big_roi <- terra::ext(c(340, 1560, 700, 34775))

whiteref <- terra::rast("capture/WHITEREF_PRI_22_PRI_22_AB_2023-01-16_13-44-50.raw")

whiteref <- raster_crop(raster = whiteref, type = "reference", roi = big_roi, ref_type = "whiteref") |>
  create_reference_raster(raster = _, roi = big_roi,  ref_type = "whiteref")
filename      : products/whiteref_cropped.tif
compute stats : 1, GDAL: 0, minmax: 0, approx: 1
driver        : GTiff
disk available: 1530.4 GB
disk needed   : 0.1 GB
memory avail. : 23.92 GB
memory allow. : 14.35 GB
memory needed : 1.731 GB  (4 copies)
in memory     : true
block size    : 100 rows
n blocks      : 100
pb            : 3

filename      : products/whiteref_agg.tif
compute stats : 1, GDAL: 0, minmax: 0, approx: 1
driver        : GTiff
disk available: 1530.3 GB
disk needed   : 0 GB
memory avail. : 23.69 GB
memory allow. : 14.22 GB
memory needed : 0.017 GB  (4 copies)
in memory     : true
block size    : 1 rows
n blocks      : 1
pb            : 900

filename      : products/whiteref_disagg.tif
compute stats : 1, GDAL: 0, minmax: 0, approx: 1
driver        : GTiff
disk available: 1538 GB
disk needed   : 75.2 GB
memory avail. : 23.64 GB
memory allow. : 14.18 GB
memory needed : 1.04646e+07 GB  (69550 copies)
in memory     : false
block size    : 1 rows
n blocks      : 34775
pb            : 3

This leads to

Error: std::bad_alloc
6. stop(structure(list(message = "std::bad_alloc", call = NULL,
cppstack = NULL), class = c("std::bad_alloc", "C++Error",
"error", "condition")))
5. x@ptr$disaggregate(fact, opt)
4. .local(x, ...)
3. terra::disagg(raster, fact = c(terra::ymax(raster), 1), filename = paste0(paths[["directory"]],
"products/", name, "_disagg.tif"), overwrite = TRUE)
2. terra::disagg(raster, fact = c(terra::ymax(raster), 1), filename = paste0(paths[["directory"]],
"products/", name, "_disagg.tif"), overwrite = TRUE)
1. create_reference_raster(raster = raster_crop(raster = whiteref,
type = "reference", roi = big_roi, ref_type = "whiteref"),
roi = big_roi, ref_type = "whiteref")

I'm working on Windows 11 machine with 32 GB RAM.

> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.utf8  LC_CTYPE=Polish_Poland.utf8    LC_MONETARY=Polish_Poland.utf8 LC_NUMERIC=C                   LC_TIME=Polish_Poland.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10      codetools_0.2-18 pak_0.4.0        fansi_1.0.4      withr_2.5.0      utf8_1.2.2       dplyr_1.1.0      R6_2.5.1         lifecycle_1.0.3  magrittr_2.0.3   pillar_1.8.1     rlang_1.0.6      cli_3.6.0       
[14] rstudioapi_0.14  vctrs_0.5.2      generics_0.1.3   tools_4.2.2      glue_1.6.2       purrr_1.0.1      compiler_4.2.2   pkgconfig_2.0.3  terra_1.7-3      tidyselect_1.2.0 tibble_3.1.8
rhijmans commented 1 year ago

Can you confirm that the below are the relevant steps in your script to get to the error? Here I am using a SpatRaster with no cell values, so the memory problem will not occur.

wref <- rast(ncol=1600, nrow=35000, nlyr=3, xmin=0, xmax=1600, ymin=0, ymax=35000)
roi <- ext(c(340, 1560, 700, 34775))
e <- c(as.vector(roi)[1:2], as.vector(ext(wref)[1:2]))
x <- crop(wref, e)
x <- aggregate(x, fact=c(nrow(x), 1), fun = "mean")
ext(x) <- roi
y <- disagg(x, fact=c(ymax(x), 1))

#class       : SpatRaster 
#dimensions  : 34775, 1220, 3  (nrow, ncol, nlyr)
#resolution  : 1, 0.9798706  (x, y)
#extent      : 340, 1560, 700, 34775  (xmin, xmax, ymin, ymax)
#coord. ref. :  

And can you please show(whiteref) so that it is clearer what you are dealing with?

mzarowka commented 1 year ago

Hi, I run your code with my whiteref and as you wrote, these are relevant steps leading to the std::bad_alloc

wref <- terra::rast("...capture/WHITEREF_PRI_22_PRI_22_AB_2023-01-16_13-44-50.raw")
roi <- ext(c(340, 1560, 700, 34775))
e <- c(as.vector(roi)[1:2], as.vector(ext(wref)[1:2]))
x <- crop(wref, e)
x <- aggregate(x, fact=c(nrow(x), 1), fun = "mean")
ext(x) <- roi
y <- disagg(x, fact=c(ymax(x), 1))
Error: std::bad_alloc---------|---------|

Raw white ref looks like this show(whiteref)

class       : SpatRaster 
dimensions  : 100, 2184, 476  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 2184, 0, 100  (xmin, xmax, ymin, ymax)
coord. ref. :  
source      : WHITEREF_PRI_22_PRI_22_AB_2023-01-16_13-44-50.raw 
names       : 397.65, 398.84, 400.02, 401.21, 402.39, 403.58, ...