rvalavi / blockCV

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See
https://doi.org/10.1111/2041-210X.13107
GNU General Public License v3.0
106 stars 22 forks source link

Error when using spatialAutoRange() for coastal species #3

Closed DomHenry closed 4 years ago

DomHenry commented 4 years ago

Dear @rvalavi ,

First up, well done on developing this package and the associated MEE paper. blockCV is a promising validation technique for the SDMs that I'm doing (especially because of the easy integration with biomod2 functions). I'm really enjoying it so far.

I've been running the blockCV functions on a number of reptile and amphibian species, and for the most part it seems to work. However, for a number of species (most of which occur along a coastline) the spatialAutoRange function throws the following error when I use the default value for the sampleNumber argument:

There are 11 raster layers Error in { : task 1 failed - "cannot take a sample larger than the population when 'replace = FALSE'"

Now if I change the sampleNumber to some arbitrary value that is less than the total number of cells in the environmental rasters then the function works. The problem is that I'm running the blockCV functions on more than 50 species and I cannot go through each and find this arbitrary threshold.

As an example: the ncell for the raster layers for species 1 is 342. If I reduce sampleNumber to 319 the spatialAutoRange function returns a valid result. However, if sampleNumber is between 320 & 342 then I get the error mentioned above. For species 2 the total cells are 4959 and the function only works if the sampleNumber is < 3367.

The only commonality is that these species occur along a coastline. Below are plots of occurrence points overlaid on a raster layer with a coastline shapefile.

B  bagginsi map

A  landdrosia

Do you have any idea what could be going on here?

I realise that it's difficult to create a reprex here because of the amount of data needed to run the functions. Please let me know if I can send you data and code which would allow you to generate the error.

During all the trials I have specified the function as follows:

sac <- spatialAutoRange( rasterLayer = envstack, sampleNumber = ncell(envstack), doParallel = TRUE, nCores = NULL, plotVariograms = FALSE, showPlots = FALSE, progress = TRUE )

P.S., I know I can use the speciesData argument as an alternative to sampleNumber but I don't want to go that route now.

Thanks, Dominic

rvalavi commented 4 years ago

Dear @DomHenry

Thank you for reporting this problem.

This function works by sampling x number of cells from the rasters and calculates the variogram models for spatial autocorrelation. You are right, the function didn't work if the number of cells was too low. I made some changes and updated the package. You just need to reinstall the package from GitHub.

I do not recommend using species argument here unless you have thousands of records and they are distributed across the whole landscape.

Cheers, Roozbeh

DomHenry commented 4 years ago

Dear @rvalavi ,

Thank you so much for the fix. I've tested it on several species and the function appears to be working perfectly.

Cheers, Dominic

rvalavi commented 4 years ago

@DomHenry great to hear that! Thanks.