rvalavi / blockCV

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See
https://doi.org/10.1111/2041-210X.13107
GNU General Public License v3.0
107 stars 24 forks source link

spatialAutoRange does not seem to be working properly for unprojected rasters #2

Closed ptitle closed 4 years ago

ptitle commented 5 years ago

I find that if I provide an unprojected raster to spatialAutoRange(), it finds one enormous block. I think this is incorrect.

> library(blockCV)
Loading required package: ggplot2
library(raster)

clim <- raster('CHELSA_bio_1.tif')

> library(raster)
Loading required package: sp
> 
> clim <- raster('CHELSA_bio_1.tif')
> clim
class       : RasterLayer 
dimensions  : 1766, 2287, 4038842  (nrow, ncol, ncell)
resolution  : 0.03333333, 0.03333333  (x, y)
extent      : -156.9918, -80.75847, 7.783194, 66.64986  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : /Users/pascaltitle/Dropbox/jays/predictors/coarse/CHELSA_bio_1.tif 
names       : CHELSA_bio_1 
values      : -282.1801, 287.2881  (min, max)

> 
> q <- spatialAutoRange(clim)
Loading required namespace: rgeos
Regions defined for each Polygons

> q$range
[1] 282386138

screen shot 2019-02-21 at 11 37 04

rvalavi commented 5 years ago

Hi @ptitle

Thanks for reporting this. Have you had a look at the fitted variogram model? If possible could you send me your raster data for further checking?

Thanks, Roozbeh email: rvalavi@student.unimelb.edu.au

rvalavi commented 5 years ago

Dear @ptitle Indeed the problem is because of the nature of your data. If you have a look at the variogram model of your layer (the following figure that quantifies the structure of spatial autocorrelation of your layer), it does not flat out and the range cannot be identified! I recommend you to choose the block size based on other methods e.g. the range of spatial autocorrelation of your model residual. As mentioned in the associated paper the spatialAutoRange is not an absolute solution to choosing block size!

image

ptitle commented 5 years ago

Following your suggestion, I generated a correlogram to estimate at what distance spatial autocorrelation drops to zero, and then used that distance for producing spatial blocks.

# sample a set of points across the raster and generate correlogram
dat <- as.data.frame(coordinates(bio1))
dat <- cbind.data.frame(dat, val = values(bio1))
dat <- dat[complete.cases(dat),]
dat <- dat[sample(1:nrow(dat), 10000),]

co1 <- ncf::correlog(dat$x, dat$y,dat$val, increment = 1, resamp = 0, latlon = T, na.rm=TRUE)
plot(co1$mean.of.class, co1$correlation, type='b', cex=0.5, ylab='correlation', xlab='distance class')
abline(h=0, lty=3)
abline(v=co1$x.intercept, lty=3)

# the dotted vertical line shows where spatial autocorrelation = 0.
zeroSA <- co1$x.intercept #in kilometers
screen shot 2019-03-06 at 10 44 16
folds <- spatialBlock(PA, rasterLayer=bio1, theRange=zeroSA*1000, k=5, biomod2Format=F)

Using that distance for spatialBlock(), I get the following:

screen shot 2019-03-06 at 10 48 05
rvalavi commented 5 years ago

Hi @ptitle Sorry for the late response. I do not receive any warning for a message if you do not mention me in the message. Well, this is still the spatial autocorrelation in the predictor raster. If you have presence-absence data, you can fit a simple model like a GLM, calculate the residuals and check the spatial autocorrelation in the residuals (e.g. with your correlogram).

I recommend having a look at Roberts et al., 2017, Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure, Ecography. There are good recommendation and background about the impacts of spatial autocorrelation on model evaluation.