`spatial_clustering_cv` retains geometry in folds causing `fit_resamples` to fail #159

The problem

When using spatial_clustering_cv to create spatial resamples, the geometry column is retained within the folds. This causes fit_resamples to fail with an error indicating that not all columns of y are known outcome types. It's unclear whether spatial_clustering_cv should drop the spatial information in the folds or if fit_resamples should exclude the geometry information. There might be something I'm missing.

Reproducible example

# Load package
library(dplyr, warn.conflicts = FALSE)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE

# Example data
nc <- st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)

# Making spatial clusters
nc_folds <- spatial_clustering_cv(nc, v = 5)

# Workflow for linear regression
lr_recipe <- workflow() %>%
  add_variables(outcomes = BIR74,
                predictors = AREA) %>%
  add_model(linear_reg(engine = "lm"))

# Tuning parameters: Fail
(spatial_lr <- fit_resamples(lr_recipe, nc_folds))
#> → A | error:   Not all columns of `y` are known outcome types. These columns have unknown types: 'geometry'.
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x5
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> # Resampling results
#> # 5-fold spatial cross-validation 
#> # A tibble: 5 × 4
#>   splits          id    .metrics .notes          
#>   <list>          <chr> <list>   <list>          
#> 1 <split [77/23]> Fold1 <NULL>   <tibble [1 × 3]>
#> 2 <split [75/25]> Fold2 <NULL>   <tibble [1 × 3]>
#> 3 <split [79/21]> Fold3 <NULL>   <tibble [1 × 3]>
#> 4 <split [84/16]> Fold4 <NULL>   <tibble [1 × 3]>
#> 5 <split [85/15]> Fold5 <NULL>   <tibble [1 × 3]>
#> There were issues with some computations:
#>   - Error(s) x5: Not all columns of `y` are known outcome types. These columns hav...
#> Run `show_notes(.Last.tune.result)` for more information.

# Best tuning parameters: : Fail
#> Error in `estimate_tune_results()`:
#> ! All models failed. Run `show_notes(.Last.tune.result)` for more information.

# Try with st_drop_geometry:
orig_class <- class(nc_folds)

nc_folds <- nc_folds %>% 
  mutate(splits = purrr::map(splits, ~ {
    .x$data <- st_drop_geometry(.x$data)

class(nc_folds) <- orig_class

# Tuning parameters
(spatial_lr <- fit_resamples(lr_recipe, nc_folds))
#> # Resampling results
#> # -fold spatial cross-validation 
#> # A tibble: 5 × 4
#>   splits          id    .metrics         .notes          
#>   <list>          <chr> <list>           <list>          
#> 1 <split [77/23]> Fold1 <tibble [2 × 4]> <tibble [0 × 3]>
#> 2 <split [75/25]> Fold2 <tibble [2 × 4]> <tibble [0 × 3]>
#> 3 <split [79/21]> Fold3 <tibble [2 × 4]> <tibble [0 × 3]>
#> 4 <split [84/16]> Fold4 <tibble [2 × 4]> <tibble [0 × 3]>
#> 5 <split [85/15]> Fold5 <tibble [2 × 4]> <tibble [0 × 3]>

# Best tuning parameters 
#> # A tibble: 2 × 6
#>   .metric .estimator     mean     n  std_err .config             
#>   <chr>   <chr>         <dbl> <int>    <dbl> <chr>               
#> 1 rmse    standard   3542.        5 634.     Preprocessor1_Model1
#> 2 rsq     standard      0.178     5   0.0616 Preprocessor1_Model1

Created on 2024-07-19 with reprex v2.1.1

Try using add_formula instead of add_variables as a workaround

(Sorry for the brief reply -- I'm traveling at the moment so can't run stuff, but wanted to make sure I could try to help you get unstuck. This is definitely a bug somewhere)

Interesting. If I do it using add_formula() it does work.

# Load package
library(dplyr, warn.conflicts = FALSE)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE

# Example data
nc <- st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)

# Making spatial clusters
nc_folds <- spatial_clustering_cv(nc, v = 5)

# Workflow for linear regression
lr_recipe <- workflow() %>%
  add_formula(BIR74 ~ AREA) %>%
  add_model(linear_reg(engine = "lm"))

# Tuning parameters
(spatial_lr <- fit_resamples(lr_recipe, nc_folds))
#> # Resampling results
#> # 5-fold spatial cross-validation 
#> # A tibble: 5 × 4
#>   splits          id    .metrics         .notes          
#>   <list>          <chr> <list>           <list>          
#> 1 <split [79/21]> Fold1 <tibble [2 × 4]> <tibble [0 × 3]>
#> 2 <split [75/25]> Fold2 <tibble [2 × 4]> <tibble [0 × 3]>
#> 3 <split [77/23]> Fold3 <tibble [2 × 4]> <tibble [0 × 3]>
#> 4 <split [85/15]> Fold4 <tibble [2 × 4]> <tibble [0 × 3]>
#> 5 <split [84/16]> Fold5 <tibble [2 × 4]> <tibble [0 × 3]>

# Best tuning parameters:
#> # A tibble: 2 × 6
#>   .metric .estimator     mean     n  std_err .config             
#>   <chr>   <chr>         <dbl> <int>    <dbl> <chr>               
#> 1 rmse    standard   3542.        5 634.     Preprocessor1_Model1
#> 2 rsq     standard      0.178     5   0.0616 Preprocessor1_Model1

Created on 2024-07-22 with reprex v2.1.1