hmsc-r / HMSC

GNU General Public License v3.0
99 stars 36 forks source link

[1] "thin = 10; samples = 250" [1] "model = presence_absence" Computing chain 1 Error: cannot allocate vector of size 406.6 Mb #104

Open bwegsche opened 3 years ago

bwegsche commented 3 years ago

Hello everyone,

I'm using Hmsc to model joint distributions of birds (presence-absence) across a watershed in Canada. We have obtained data from a database including 2060 surveyed sites and 102 observed species. I included 3 continuous environmental predictors and two trait variables (diet and log body mass) in my model formula. We don't include phylogenetic information. I'm using the template for the model pipeline including 8 steps available at the HMSC/Statistical Ecology website. The steps S1 data preparation and S2 define model worked without issues. However, when fitting the model (S3) with sampleMcmc() I get following error message for the parameter set thin = 10 and samples = 250: "Computing chain 1 Error: cannot allocate vector of size 406.6 Mb";

The model produced a result and ran without issues for thin = 1 and sample = 5.

Is my dataset simply too large or am I missing an important point to reduce the size of the output? I hope this description is useful to diagnose the issue. Please let me know if you need further information.

Best regards, Bernhard

ovaskain commented 3 years ago

Hi,

Your data looks fine in the sense of not being too large for Hmsc analyses. I wonder if you included a full spatial model? With 2060 sites that would result in 2060 x 2060 variance-covariance matrices (100 of them, one for each spatial scale parameter) that would blow up memory. If this is the case, use e.g. NNGP as the method for spatial random effect.

Otso

From: bwegsche @.> Sent: keskiviikko 23. kesäkuuta 2021 13:07 To: hmsc-r/HMSC @.> Cc: Subscribed @.***> Subject: [hmsc-r/HMSC] [1] "thin = 10; samples = 250" [1] "model = presence_absence" Computing chain 1 Error: cannot allocate vector of size 406.6 Mb (#104)

Hello everyone,

I'm using Hmsc to model joint distributions of birds (presence-absence) across a watershed in Canada. We have obtained data from a database including 2060 surveyed sites and 102 observed species. I included 3 continuous environmental predictors and two trait variables (diet and log body mass) in my model formula. We don't include phylogenetic information. I'm using the template for the model pipeline including 8 steps available at the HMSC/Statistical Ecology website. The steps S1 data preparation and S2 define model worked without issues. However, when fitting the model (S3) with sampleMcmc() I get following error message for the parameter set thin = 10 and samples = 250: "Computing chain 1 Error: cannot allocate vector of size 406.6 Mb";

The model produced a result and ran without issues for thin = 1 and sample = 5.

Is my dataset simply too large or am I missing an important point to reduce the size of the output? I hope this description is useful to diagnose the issue. Please let me know if you need further information.

Best regards, Bernhard

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/hmsc-r/HMSC/issues/104, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEIYMZUGBDKIMV54LJAIZKTTUGW3FANCNFSM47FO6HRQ.

andburch commented 3 years ago

Along the same lines, when I run computePredictedValues() on my model I get a similar error: Error: cannot allocate vector of size 8.3 Gb. That's a pretty huge vector, and I'm not sure where this is coming from.

My toy model has 119 species, 1 covariate (with 10 categories), 1 trait, and a random effect of time (24 years) and XY location (130 unique points). My trait and my covariate are dummy covariates, I haven't even included the real data there. Is my model somehow already too big?

(Hmsc object with 3106 sampling units, 119 species, 10 covariates, 1 traits and 2 random levels. Posterior MCMC sampling with 3 chains each with 1000 samples, thin 5 and transient 2500)

jarioksa commented 3 years ago

I cannot see any obvious reason for the huge allocations in the info that you have supplied. Please show the output of getCall(<theNameOfYourHmscModel>).

andburch commented 3 years ago
Hmsc(Y = Y, XFormula = XFormula, XData = XData, studyDesign = studyDesign, 
    ranLevels = list(ident = rL.spatial, time = rL.time), distr = "lognormal poisson")
Basquill commented 2 years ago

I'm getting a similar (but even more extreme) memory error for S4_evaluate convergence.
Error: cannot allocate vector of size 23.1 Gb (!)

Any ideas? I can request more memory at the HPC, but this is very large

Running a hurdle model on 429 plots (no traits or phylogeny); 1 random level. 11 predictors (4 categorical). Ypa = 995 variables

Hmsc(Y = Ypa, XFormula = XFormula, XData = X, studyDesign = studyDesign, ranLevels = { list(plot = rL.plot) }, distr = c(rep("normal", 10), rep("probit", 985)))

Hmsc(Y = (Yabu), XFormula = XFormula, XData = X, YScale = TRUE, studyDesign = studyDesign, ranLevels = { list(plot = rL.plot) }, distr = "normal")

Thanks v much.

Basquill commented 2 years ago

@andburch Did you solve your memory error? I'm getting something similar after running the S4 script. There is no obvious reason why my vector should require >20 gb of memory. Thanks

cgoetsch commented 2 years ago

Hi,

I am also running into this issue with computePredictedValues and evaluateModelFit:

  preds = computePredictedValues(m)

Error: cannot allocate vector of size 4.6 Gb MF = evaluateModelFit(hM=m, predY=preds) Error: cannot allocate vector of size 4.6 Gb

We are running a NNGP spatial model with 2 additional random effects: an unstructured random effect at the sample level and a temporal random effect. We have been having issues getting the spatial random effect to converge, so to that end, we have increased our sampling of the posterior to 2000 samples, which improves our psrf values.

Here are our model specifications: Hmsc object with 7320 sampling units, 14 species, 13 covariates, 1 traits and 3 random levels Posterior MCMC sampling with 3 chains each with 2000 samples, thin 200 and transient 4e+05

And the model call: Hmsc(Y = Y, XFormula = XFormula.1.c, XData = X, studyDesign = studyDesign, ranLevels = list(tow = rL.tow, grid_id = rL.spatial, year = rL.time), distr = "probit")

Even with the error, I am getting a preds object: length 614880000; size 4.6GB; value large array (614880000) and an MF object.

So, I am confused. Are the functions working? Can I trust the output in preds and MF or is there something wrong?

Thanks!

ovaskain commented 2 years ago

Hi,

With 7320 sampling units and 14 species the linear predictor (or equally well, the prediction itself) has 102,480 so ca. 10^5 numbers. With 3 chains of 2000 samples, you have 6000 copies of the linear predictor, so already putting that into memory takes ca. 6*10^8 numbers. Given that there are also other parameters involved, no wonder that you need a lot of space. We should probably make a version that would give only e.g. posterior mean prediction, for which not all samples would need to be stored, as probably the entire posterior distribution of the prediction is not always needed and as it can be excessively large. What you could possibly do is to do this by yourself by looping over the posterior samples and making prediction for each of them one by one, summing up, and dividing by their number. So check if you can do the prediction if you modify the posterior so that it has just one (or a small number) of samples instead of the 6000 samples. Then you know that you can go around the problem with small additional coding.

Best,

O2

From: cgoetsch @.> Sent: keskiviikko 25. elokuuta 2021 19:13 To: hmsc-r/HMSC @.> Cc: Ovaskainen, Otso T @.>; Comment @.> Subject: Re: [hmsc-r/HMSC] [1] "thin = 10; samples = 250" [1] "model = presence_absence" Computing chain 1 Error: cannot allocate vector of size 406.6 Mb (#104)

Hi,

I am also running into this issue with computePredictedValues and evaluateModelFit:

preds = computePredictedValues(m)

Error: cannot allocate vector of size 4.6 Gb

MF = evaluateModelFit(hM=m, predY=preds)

Error: cannot allocate vector of size 4.6 Gb

We are running a NNGP spatial model with 2 additional random effects: an unstructured random effect at the sample level and a temporal random effect. We have been having issues getting the spatial random effect to converge, so to that end, we have increased our sampling of the posterior to 2000 samples, which improves our psrf values.

Here are our model specifications: Hmsc object with 7320 sampling units, 14 species, 13 covariates, 1 traits and 3 random levels Posterior MCMC sampling with 3 chains each with 2000 samples, thin 200 and transient 4e+05

And the model call: Hmsc(Y = Y, XFormula = XFormula.1.c, XData = X, studyDesign = studyDesign, ranLevels = list(tow = rL.tow, grid_id = rL.spatial, year = rL.time), distr = "probit")

Even with the error, I am getting a preds object: length 614880000; size 4.6GB; value large array (614880000) and an MF object.

So, I am confused. Are the functions working? Can I trust the output in preds and MF or is there something wrong?

Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/hmsc-r/HMSC/issues/104#issuecomment-905674091, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEIYMZSWT5V3S6PAX4NE3KTT6UJBXANCNFSM47FO6HRQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

jarioksa commented 2 years ago

Also note that there was a glitch in computePredictedValues before March 3, 2021: before that computePredictedValues did not honour thin argument, but tried to store all samples. In you case with thin = 200 this means 200-fold excess need of storage. This fix is not yet in the latest CRAN release and you should install Hmsc from github if this is your problem. See issue #86 for discussion, details and further pointers.

cgoetsch commented 2 years ago

@jarioksa

Thanks. I last updated from github on May 12, so this probably isn't the problem. But I will update again just to be sure.

cgoetsch commented 2 years ago

@jarioksa and @ovaskain

I updated HMSC from github and reran computePredictedValues and evaluateModelFit. And they both ran without errors, although it says that the array for preds is the same size. Anyway, it seems to be working now, but if we have to increase samples or have a bigger model in the future, I will definitely look at Otso's suggestion for a work-around.

Thanks to both of you.

cgoetsch commented 2 years ago

Just a quick update. It appears that the solution was not updating the version of HMSC, but completely restarting R/Rstudio. Apparently, restarting R was able to clear previously allocated memory and allow the preds vector to be allocated. Deleting large R objects in the environment and freeing memory within the same session of R does not solve the issue.