hmsc-r / HMSC

GNU General Public License v3.0
102 stars 37 forks source link

Strategy for simplifying data environment to improve processing time & overall likelihood of model fitting #130

Open Basquill opened 2 years ago

Basquill commented 2 years ago

Hello -

I'm back to HMSC after a number of months on other tasks.

Prior to this break I was running trials with 429 plots, 8 continuous predictors, and 328 species (no traits or phylogeny). Trials ran for a long time (weeks), and generally resulted in poor fit. My predictors are not correlated (I ran VIF analyses), and previously I had eliminated categorical predictors figuring they were slowing things down and making the models more cumbersome.

Are there suggestions for simplifying one's data environment to reduce MCMC processing times, as the user explores the potential of various combinations of predictors? Previously, I reduced the size of my plot environment by only selecting treed vegetation. Now I'm wondering, for example, whether it's worthwhile truncating down to either just deciduous or coniferous forests.

I have previous experience in vegetation classification and ordination; when defining a particular vegetation type, we generally seek redundancy in the data. I.e., its easier to explain variation and get a tight unit concept if the plots are floristically similar to one another. If those patterns translate (at all) to spatial modelling of communities, I'm thinking a reasonable strategy would be to seek high redundancy in a smaller plot data environment.

Any suggestions are appreciated. Thanks very much.

cgoetsch commented 2 years ago

HI,

I also have a large dataset/data environment with 7000+ sampling units, 10 environmental covariates, but less species than you. I found by accident that turning off the gammaEta updater resulted in much faster runtimes and I did not see a noticeable difference in the convergence between the models with the gammaEta updater on vs. off. This may not help with convergence problems for complex models - I am still having trouble getting adequate convergence for my model even with very large MCMC sampling settings. However, these models will run in 4-6 days without the gammaEta updater vs over 16 days with it on.

Good luck!