rvalavi / blockCV

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See
https://doi.org/10.1111/2041-210X.13107
GNU General Public License v3.0
107 stars 24 forks source link

Setting up grid of cross-validation folds/blocks for irregular shaped point data #14

Closed rudeboybert closed 3 years ago

rudeboybert commented 4 years ago

Hello, thank you for writing up a wonderful package. Much of my work involves setting up rectangular grids of cross-validation folds/blocks; the spatialBlock() does the trick. However, when the point data form an irregular grid the function breaks down. Here is an example data set with my desired grid in red:

library(tidyverse)
library(sf)
library(blockCV)

# Read data & convert to sf
bw <- "https://deepblue.lib.umich.edu/data/downloads/z603qx485" %>%
  read_delim(delim = "\t") %>%
  select(treeid, gx, gy) %>%
  filter(gx >= -300) %>%
  st_as_sf(coords = c("gx", "gy"))

ggplot() +
  # Plot points:
  geom_sf(data = bw, size = 0.1) +
  # Desired 4 x 8 fold grid (23 blocks have points modulo some bleed in points)
  geom_vline(xintercept = seq(from = -300, to = 500, by = 100), col = "red") +
  geom_hline(yintercept = seq(from = 0, to = 400, by = 100), col = "red")

grid

Here is the sequence of attempts I follow to associate each point to foldIDs by running spatialBlock(). In particular setting k is giving me trouble.

# Chop up into 32 blocks:
bw_grid <- spatialBlock(
  speciesData = bw,
  rows = 4, cols = 8, k = 32, theRange = 100,
  verbose = FALSE, progress = FALSE, seed = 76
)
#> Warning in rasterNet(speciesData, resolution = theRange, xbin = cols, ybin =
#> rows, : The input layer has no CRS defined. Based on the extent of the input map
#> it is assumed to have a projected reference system
#> Error in spatialBlock(speciesData = bw, rows = 4, cols = 8, k = 32, theRange = 100, : 'k' is bigger than the number of spatial blocks
#> The number of spatial blocks is: 28

# Chop up into 28 blocks as indicated by warning message
bw_grid <- spatialBlock(
  speciesData = bw,
  rows = 4, cols = 8, k = 28, theRange = 100,
  verbose = FALSE, progress = FALSE, seed = 76
)
#> Warning in rasterNet(speciesData, resolution = theRange, xbin = cols, ybin =
#> rows, : The input layer has no CRS defined. Based on the extent of the input map
#> it is assumed to have a projected reference system
#> Warning in spatialBlock(speciesData = bw, rows = 4, cols = 8, k = 28, theRange =
#> 100, : The folds 11, 20 have class(es) with 0 (or less) records
#> Error in `[[<-.data.frame`(`*tmp*`, i, value = c(5, 1, 13, 18, 10, 16, : replacement has 28 rows, data has 30

# Chop up into 30 blocks as indicated by error message
bw_grid <- spatialBlock(
  speciesData = bw,
  rows = 4, cols = 8, k = 30, theRange = 100,
  verbose = FALSE, progress = FALSE, seed = 76
)
#> Warning in rasterNet(speciesData, resolution = theRange, xbin = cols, ybin =
#> rows, : The input layer has no CRS defined. Based on the extent of the input map
#> it is assumed to have a projected reference system
#> Error in spatialBlock(speciesData = bw, rows = 4, cols = 8, k = 30, theRange = 100, : 'k' is bigger than the number of spatial blocks
#> The number of spatial blocks is: 28

Could you let me know if I'm missing something obvious here?

Here is my sessionInfo(); I have installed the latest dev version of blockCV from GitHub

sessionInfo()
#> R version 4.0.1 (2020-06-06)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.1  magrittr_1.5    tools_4.0.1     htmltools_0.5.0
#>  [5] yaml_2.2.1      stringi_1.4.6   rmarkdown_2.3   highr_0.8      
#>  [9] knitr_1.29      stringr_1.4.0   xfun_0.16       digest_0.6.25  
#> [13] rlang_0.4.7     evaluate_0.14
rvalavi commented 3 years ago

old comment

rvalavi commented 3 years ago

I checked the data and couldn't find why it has the problem