The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See
Hello, thank you for writing up a wonderful package. Much of my work involves setting up rectangular grids of cross-validation folds/blocks; the spatialBlock() does the trick. However, when the point data form an irregular grid the function breaks down. Here is an example data set with my desired grid in red:
library(tidyverse)
library(sf)
library(blockCV)
# Read data & convert to sf
bw <- "https://deepblue.lib.umich.edu/data/downloads/z603qx485" %>%
read_delim(delim = "\t") %>%
select(treeid, gx, gy) %>%
filter(gx >= -300) %>%
st_as_sf(coords = c("gx", "gy"))
ggplot() +
# Plot points:
geom_sf(data = bw, size = 0.1) +
# Desired 4 x 8 fold grid (23 blocks have points modulo some bleed in points)
geom_vline(xintercept = seq(from = -300, to = 500, by = 100), col = "red") +
geom_hline(yintercept = seq(from = 0, to = 400, by = 100), col = "red")
Here is the sequence of attempts I follow to associate each point to foldIDs by running spatialBlock(). In particular setting k is giving me trouble.
# Chop up into 32 blocks:
bw_grid <- spatialBlock(
speciesData = bw,
rows = 4, cols = 8, k = 32, theRange = 100,
verbose = FALSE, progress = FALSE, seed = 76
)
#> Warning in rasterNet(speciesData, resolution = theRange, xbin = cols, ybin =
#> rows, : The input layer has no CRS defined. Based on the extent of the input map
#> it is assumed to have a projected reference system
#> Error in spatialBlock(speciesData = bw, rows = 4, cols = 8, k = 32, theRange = 100, : 'k' is bigger than the number of spatial blocks
#> The number of spatial blocks is: 28
# Chop up into 28 blocks as indicated by warning message
bw_grid <- spatialBlock(
speciesData = bw,
rows = 4, cols = 8, k = 28, theRange = 100,
verbose = FALSE, progress = FALSE, seed = 76
)
#> Warning in rasterNet(speciesData, resolution = theRange, xbin = cols, ybin =
#> rows, : The input layer has no CRS defined. Based on the extent of the input map
#> it is assumed to have a projected reference system
#> Warning in spatialBlock(speciesData = bw, rows = 4, cols = 8, k = 28, theRange =
#> 100, : The folds 11, 20 have class(es) with 0 (or less) records
#> Error in `[[<-.data.frame`(`*tmp*`, i, value = c(5, 1, 13, 18, 10, 16, : replacement has 28 rows, data has 30
# Chop up into 30 blocks as indicated by error message
bw_grid <- spatialBlock(
speciesData = bw,
rows = 4, cols = 8, k = 30, theRange = 100,
verbose = FALSE, progress = FALSE, seed = 76
)
#> Warning in rasterNet(speciesData, resolution = theRange, xbin = cols, ybin =
#> rows, : The input layer has no CRS defined. Based on the extent of the input map
#> it is assumed to have a projected reference system
#> Error in spatialBlock(speciesData = bw, rows = 4, cols = 8, k = 30, theRange = 100, : 'k' is bigger than the number of spatial blocks
#> The number of spatial blocks is: 28
Could you let me know if I'm missing something obvious here?
Here is my sessionInfo(); I have installed the latest dev version of blockCV from GitHub
Hello, thank you for writing up a wonderful package. Much of my work involves setting up rectangular grids of cross-validation folds/blocks; the
spatialBlock()
does the trick. However, when the point data form an irregular grid the function breaks down. Here is an example data set with my desired grid in red:Here is the sequence of attempts I follow to associate each point to
foldID
s by runningspatialBlock()
. In particular settingk
is giving me trouble.Could you let me know if I'm missing something obvious here?
Here is my
sessionInfo()
; I have installed the latest dev version ofblockCV
from GitHub