hsbadr / HiClimR

Hierarchical Climate Regionalization
https://hsbadr.github.io/HiClimR/
GNU General Public License v3.0
14 stars 8 forks source link

Memory issue : cannot allocate vector of size #5

Open mohseniaref-InSAR opened 2 years ago

mohseniaref-InSAR commented 2 years ago

Hi,

I got the following error, I was wondering if you can help me regarding this issue? Is there any way to deal with memory issues and big data?

Best regards, Mohammad

library(HiClimR)
library(ncdf4)
nc_data <- nc_open('/raid-manaslu/maref/InSAR/S1/QdtFilterSenAT76/geo_timeseries_ramp_demErr_multiply-1_msk.nc')
nc <- ncvar_get(nc_data, "timeseries")
lon <- ncvar_get(nc_data, "longitude")
lat <- ncvar_get(nc_data, "latitude", verbose = F)
t <- ncvar_get(nc_data, "time")
xGrid <- grid2D(lon = unique(lon), lat = unique(lat))
lonn <- c(xGrid$lon)
latt <- c(xGrid$lat)
n <- aperm(nc, c(3,2,1))
x <- t(matrix(n, nrow=dim(n)[1], byrow=FALSE))
y <- HiClimR(x, lon = lonn, lat = latt, lonStep = 1, latStep = 1, geogMask = FALSE, meanThresh = FALSE, varThresh = 0, detrend = FALSE,standardize = TRUE, nPC = NULL, method = "single", hybrid = FALSE, kH = NULL,members=NULL,nSplit = 4,upperTri = TRUE, verbose = TRUE,validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,plot = TRUE, colPalette = NULL, hang = -1,labels = FALSE)

The error


PROCESSING STARTED

Checking Multivariate Clustering (MVC)...
---> x is a matrix
---> single-variate clustering: 1 variable
Checking data...
---> Checking dimensions...
---> Checking row names...
---> Checking column names...
Data filtering...
---> Computing mean for each row...
---> Checking rows with mean bellow meanThresh...
---> 60485454 rows found, mean ≤  FALSE
---> Computing variance for each row...
---> Checking rows with near-zero-variance...
---> 41238667 rows found, variance ≤  0
Data preprocessing...
---> Applying mask...
---> Checking columns with missing values...
---> Standardizing data...
Agglomerative Hierarchical Clustering...
---> Computing correlation/dissimilarity matrix...
Error: cannot allocate vector of size 479897.8 Gb
hsbadr commented 2 years ago

y <- HiClimR(x, lon = lonn, lat = latt, lonStep = 1, latStep = 1, geogMask = FALSE, meanThresh = FALSE, varThresh = 0, detrend = FALSE,standardize = TRUE, nPC = NULL, method = "single", hybrid = FALSE, kH = NULL,members=NULL,nSplit = 4,upperTri = TRUE, verbose = TRUE,validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,plot = TRUE, colPalette = NULL, hang = -1,labels = FALSE)

Data filtering... ---> 60485454 rows found, mean ≤ FALSE ---> 41238667 rows found, variance ≤ 0

@mohseniaref-InSAR meanThresh should be numeric not logical. Also, it seems that a large number of rows has been filtered with variance ≤ 0. As for the memory allocation error, the dissimilarity matrix for big data requires large amount of memory. You may try coarsening spatial resolution (lonStep and/or latStep > 1)or increasing the number of splits (nSplit > 1):

  # Coarsening spatial resolution
  lonStep = 1, latStep = 1,

  # Big data support:
  nSplit = 1, upperTri = TRUE, verbose = TRUE,