r-spatial / stars

Spatiotemporal Arrays, Raster and Vector Data Cubes
https://r-spatial.github.io/stars/
Apache License 2.0
563 stars 94 forks source link

stars_proxy memory hog #708

Open dazu89 opened 2 months ago

dazu89 commented 2 months ago

Intending to build a high-dimensional data cube from raster files in plain text ASCII grid format I read all files' meta data (file path and attributes) into a data frame (1), group by dimensions and concatenate files in each group into a stars_proxy (2) to then summarize/concantenate the stars_proxys into a higher dimensional star_proxy (3), similar to the process described in this post on StackExchange or this Github issue.

Upon loading the star_proxy via my_star_proxy |> st_as_stars() the memory usage ascends into 10s of GB even if only a couple of files with file size of 5-10 MB are read. The problem only occurs with files of the following format

ncols                   500
nrows                  500
xllcorner              6.5
yllcorner              -65.5
cellsize                 0.002
NODATA_value            -9.9990E+03
-9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 ...
-9.9990E+03 -9.9990E+03  0.5000E-02  1.5000E+02 -9.9990E+03 ...
-9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 ...
.           .           .           .           .           .
.           .           .           .           .            .
.           .           .           .           .             .

whereas with standard data no such problem occurs and only a couple 100 MB are used.

library(stars)
library(profmem)
options(profmem.threshold = 1e6)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
rs_mem = read_stars(tif)
print(object.size(rs_mem), standard = "SI", units = 'auto')
r = read_stars(list(a = c(tif,tif), b = c(tif, tif)), proxy = TRUE)
(xx = st_redimension(r, along = list(foo = 1:4)))
(rr = c(xx, xx))
(rrr = st_redimension(rr, along = list(bar = as.Date(c("2001-01-01", "2002-01-01")))))
p <- profmem({
  test = rrr |> st_as_stars()
})
sum(p$bytes, na.rm=TRUE) / 1e6

I suspect, I should supply some options to the read_stars routine but so far have not good guess.