sylvainschmitt / SSDM

Stacked Species Distribution Modelling R package
Other
41 stars 17 forks source link

[SSDM] error in cluster: Input occurrence problem ? #100

Closed pvpoli closed 3 years ago

pvpoli commented 4 years ago

I'm trying to run the SSDM package for modeling imput_SSDM.txt different species assemblages. I have a presence-absence dataset, as the example attached. When running the stack_modelling function with both presence and absences I always get the "Error in unserialize(node$con)", which is typical of a lack of memory in one core. I get this error no matter how many cores I select (from 1 to 3). I get no error when imputing only presence data. As you can see in the attached file, I'm not working with a huge dataset. Actually, I've been able to run much bigger simulations under the same algorithms using the biomod2 package. I'm thinking it is a problem in the way the data is formatted. could you give me any insight on this error? The line I'm trying to run is as follows:

SSDM <- stack_modelling(c('GLM','RF','GBM','GAM'), pa.lin, bioIX, ensemble.thresh = 0, Xcol = 'LONGITUDE', Ycol = 'LATITUDE', Spcol = 'SPECIES', rep = 10, method = "pSSDM", verbose = T,cores = 3)

Thank you all in advance for any help you can give.

lukasbaumbach commented 4 years ago

In general, SSDM produces rather large objects, since it tries to bundle a lot of info in one spot (the Stacked.SDM object). For that to happen, all data needs to be in memory at some point for ensembling and stacking (the rasters make up most of the necessary memory). So it's not really memory-optimized, alas. But it mainly depends on the extent you are modelling. I would recommend you to try setting the tmp parameter to TRUE or (even better) a path where temporary files can be stored. This can take away a lot of the memory spikes. Also, depending on your setting, you can choose whether to parallelize along species, algorithms or replicates with parmode. That way you can ensure the load is split more evenly between the cores. From my experience, however, it is best to split up the modelling into smaller steps, so you have less memory load and don't need to repeat everything over and over. Try running ensemble_modelling separately for each species, save the results and then stack them with the stacking function. GBM, RF and GAM are rather heavy methods, so it might be good splitting them up anyways.