Error using tune_race_anova() with spatialsample objects and parallel computing

The problem

I'm having trouble using the tune_race_anova() function with a spatial_block_cv object from the spatialsample package whilst using parallel computing. I receive the error “There were no valid metrics for the ANOVA model.” and the show_notes(.Last.tune.result) gives Error in FUN(): ! x must be a vector, not a <sfc_POINT/sfc> object. The error does not occur if I don't use parallel computing.

I get a similar error when using the tune_grid() function from the tune package. However, if I specify control = control_grid(pkgs = "sf"), the tune_grid() function will work with the spatial_block_cv object and parallel computing.

I think the issue is that specifying control = control_race(pkgs = "sf") in the tune_race_anova() function is not being passed to control$pkgs (line 232 in the function code). I can get the tune_race_anova() function to work with parallel computing and a spatial_block_cv object if I modify the function by including “sf” in the list of packages passed to control$pkgs.

The reprex shows tuning using the tune_grid() function, with and without the control = control_grid(pkgs = "sf") argument to show the error and how it is addressed; as well as tuning using the tune_race_anova() function with and without the control = control_race(pkgs = "sf") to show that the error remains.

I did have some success using the workaround suggested for #39 which I have included at the end of the reprex.

Thanks for your help!

Reproducible example

# Load packages and prepare data ------------------------------------------

#> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE


# Function to clean up parallel computing backends (if needed) from:
# paste0("https://stackoverflow.com/questions/64519640/",
# "error-in-summary-connectionconnection-invalid-connection")
unregister_dopar <- function() {
  env <- foreach:::.foreachGlobals
  rm(list=ls(name=env), pos=env)

# Data
data("ames", package = "modeldata")

# Convert to sf object for spatial resampling
ames_sf <- sf::st_as_sf(
  x = ames[1:200, ],
  coords = c("Longitude", "Latitude"),
  crs = 4326

# Resampling --------------------------------------------------------------

# Spatial resampling using the spatialsample package
spatial_block_folds <- spatial_block_cv(ames_sf, v = 5)

# Model specification -----------------------------------------------------

bart_spec <-
  parsnip::bart(trees = tune()) |>
  set_mode("regression") |>

bart_rec <-
  recipe(Sale_Price ~ Year_Built + Bldg_Type + Gr_Liv_Area,
         data = ames)

bart_wflow <-
  workflow() |>
  add_model(bart_spec) |>

# Grid tuning using the tune package --------------------------------------

## Grid tuning without control - gives an error message
cores <- parallel::detectCores(logical = FALSE)
cl <- parallel::makePSOCKcluster(cores)

tune_grid_no_control <-
  bart_wflow |>
    resamples = spatial_block_folds
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.

# Show what the error message was
#> unique notes:
#> ─────────────────────────────────────────────────────
#> Error in `FUN()`:
#> ! `x` must be a vector, not a <sfc_POINT/sfc> object.


## Grid tuning specifying the sf package in control - no error message
cl <- parallel::makePSOCKcluster(cores)

tune_grid_with_control <-
  bart_wflow |>
    resamples = spatial_block_folds,
    control = control_grid(pkgs = "sf")


# Racing method using the finetune package --------------------------------

## Racing method without control - gives the same error message
cl <- parallel::makePSOCKcluster(cores)

tune_race_no_control <-
  bart_wflow |>
    resamples = spatial_block_folds
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Error in `test_parameters_gls()`:
#> ! There were no valid metrics for the ANOVA model.
#> Backtrace:
#>     ▆
#>  1. ├─finetune::tune_race_anova(bart_wflow, resamples = spatial_block_folds)
#>  2. └─finetune:::tune_race_anova.workflow(bart_wflow, resamples = spatial_block_folds)
#>  3.   └─finetune:::tune_race_anova_workflow(...)
#>  4.     └─finetune:::test_parameters_gls(res, control$alpha)
#>  5.       └─rlang::abort("There were no valid metrics for the ANOVA model.")

# Show what the error message was
#> unique notes:
#> ─────────────────────────────────────────────────────
#> Error in `FUN()`:
#> ! `x` must be a vector, not a <sfc_POINT/sfc> object.


## Racing method specifying the sf package in control - error remains
cl <- parallel::makePSOCKcluster(cores)

tune_race_no_control <-
  bart_wflow |>
    resamples = spatial_block_folds,
    control = control_race(pkgs = "sf")
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Error in `test_parameters_gls()`:
#> ! There were no valid metrics for the ANOVA model.
#> Backtrace:
#>     ▆
#>  1. ├─finetune::tune_race_anova(...)
#>  2. └─finetune:::tune_race_anova.workflow(...)
#>  3.   └─finetune:::tune_race_anova_workflow(...)
#>  4.     └─finetune:::test_parameters_gls(res, control$alpha)
#>  5.       └─rlang::abort("There were no valid metrics for the ANOVA model.")

# Show what the error message was
#> unique notes:
#> ─────────────────────────────────────────────────────
#> Error in `FUN()`:
#> ! `x` must be a vector, not a <sfc_POINT/sfc> object.


## Racing method using result from tune_grid() as the initial argument 

# This comes from the suggested workaround for another issue. See: 
# https://github.com/tidymodels/finetune/issues/39#issuecomment-1132266958

# I think this works:
cl <- parallel::makePSOCKcluster(cores)
bart_rs <- 
  bart_wflow |>
  tune_grid(resamples = spatial_block_folds,
            control = control_grid(pkgs = "sf"),
            grid = 3)
tune_race_init <-
  bart_wflow |>
    resamples = spatial_block_folds,
    iter = 3,
    initial = bart_rs
#> Warning: The `...` are not used in this function but one or more objects were
#> passed: 'iter', 'initial'


Created on 2023-05-30 with reprex v2.0.2

Thanks for the issue, @jdberson!

Does https://github.com/tidymodels/finetune/pull/100 do the trick for you? You can install it with devtools::install_github("tidymodels/finetune#100").

Hi @simonpcouch

Thanks very much for looking into this. Unfortunately I'm still getting an error message when using parallel computing.

I've included a reprex in case it's useful.

#> Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE

# Data
data("ames", package = "modeldata")

# Convert to sf object for spatial resampling
ames_sf <- sf::st_as_sf(
  x = ames[1:200, ],
  coords = c("Longitude", "Latitude"),
  crs = 4326

# Spatial resampling using the spatialsample package
spatial_block_folds <- spatial_block_cv(ames_sf, v = 5)

# Workflow
bart_spec <-
  parsnip::bart(trees = tune()) |>
  set_mode("regression") |>

bart_rec <-
  recipe(Sale_Price ~ Year_Built + Bldg_Type + Gr_Liv_Area,
    data = ames

bart_wflow <-
  workflow() |>
  add_model(bart_spec) |>

# Tuning using tune_race_anova()
cores <- parallel::detectCores(logical = FALSE)
cl <- parallel::makePSOCKcluster(cores)

tune_race_with_control <-
  bart_wflow |>
    resamples = spatial_block_folds,
    control = control_race(pkgs = "sf")
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> Error in `test_parameters_gls()`:
#> ! There were no valid metrics for the ANOVA model.
#> Backtrace:
#>     ▆
#>  1. ├─finetune::tune_race_anova(...)
#>  2. └─finetune:::tune_race_anova.workflow(...)
#>  3.   └─finetune:::tune_race_anova_workflow(...)
#>  4.     └─finetune:::test_parameters_gls(res, control$alpha, opt_metric_time)
#>  5.       └─cli::cli_abort("There were no valid metrics for the ANOVA model.")
#>  6.         └─rlang::abort(...)

# Show more information
#> unique notes:
#> ─────────────────────────────────────────────────────
#> Error in `vec_size()`:
#> ! `x` must be a vector, not a <sfc_POINT/sfc> object.

Created on 2024-01-22 with reprex v2.0.2

Ah, I see. #74 should do the trick. :)