tidymodels / hardhat

Construct Modeling Packages
https://hardhat.tidymodels.org
Other
101 stars 15 forks source link

Regression in development version of hardhat when using sf objects #228

Closed mikemahoney218 closed 1 year ago

mikemahoney218 commented 1 year ago

The problem

A regression in the development version of hardhat is causing extratests to fail, as it appears tuning no longer works with sf objects:

install.packages("hardhat")
library(tidymodels)
library(spatialsample)
set.seed(7898)
folds <- spatial_clustering_cv(boston_canopy, v = 5)

tree_spec <- decision_tree(cost_complexity = tune(), tree_depth = tune()) %>%
  set_engine("rpart") %>%
  set_mode("regression")

workflow() %>%
  add_model(tree_spec) %>%
  add_formula(mean_heat_index ~ change_canopy_percentage + canopy_percentage_2019 + land_area) %>%
  tune_grid(resamples = folds, grid = 5, metrics = metric_set(rmse))
#> # Tuning results
#> # 5-fold spatial cross-validation 
#> # A tibble: 5 × 4
#>   splits            id    .metrics         .notes          
#>   <list>            <chr> <list>           <list>          
#> 1 <split [490/192]> Fold1 <tibble [5 × 6]> <tibble [0 × 3]>
#> 2 <split [513/169]> Fold2 <tibble [5 × 6]> <tibble [0 × 3]>
#> 3 <split [604/78]>  Fold3 <tibble [5 × 6]> <tibble [0 × 3]>
#> 4 <split [597/85]>  Fold4 <tibble [5 × 6]> <tibble [0 × 3]>
#> 5 <split [524/158]> Fold5 <tibble [5 × 6]> <tibble [0 × 3]>

Created on 2023-03-23 with reprex v2.0.2

pak::pkg_install("tidymodels/hardhat")
library(tidymodels)
library(spatialsample)
set.seed(7898)
folds <- spatial_clustering_cv(boston_canopy, v = 5)

tree_spec <- decision_tree(cost_complexity = tune(), tree_depth = tune()) %>%
  set_engine("rpart") %>%
  set_mode("regression")
workflow() %>%
  add_model(tree_spec) %>%
  add_formula(mean_heat_index ~ change_canopy_percentage + canopy_percentage_2019 + land_area) %>%
  tune_grid(resamples = folds, grid = 5, metrics = metric_set(rmse))
#> → A | error:   invalid type (list) for variable 'geometry'
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x25
#> 
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> # Tuning results
#> # 5-fold spatial cross-validation 
#> # A tibble: 5 × 4
#>   splits            id    .metrics .notes          
#>   <list>            <chr> <list>   <list>          
#> 1 <split [490/192]> Fold1 <NULL>   <tibble [5 × 3]>
#> 2 <split [513/169]> Fold2 <NULL>   <tibble [5 × 3]>
#> 3 <split [604/78]>  Fold3 <NULL>   <tibble [5 × 3]>
#> 4 <split [597/85]>  Fold4 <NULL>   <tibble [5 × 3]>
#> 5 <split [524/158]> Fold5 <NULL>   <tibble [5 × 3]>
#> 
#> There were issues with some computations:
#> 
#>   - Error(s) x25: invalid type (list) for variable 'geometry'
#> 
#> Run `show_notes(.Last.tune.result)` for more information.

Created on 2023-03-23 with reprex v2.0.2

Note that the list column isn't being used in the model itself; this is pretty normal when fitting models to spatial data.

The closest I've come to finding the actual issue is that it seems hardhat is calling model.frame() on the entire data frame, which then errors on the list column. Unfortunately I haven't been able to dig further, as I get a bit lost in the inner workings of tune.

mikemahoney218 commented 1 year ago

Tiny bit of further digging, the regression appears in 94cfbc9 (55e303b immediately before it runs fine):

pak::pkg_install("tidymodels/hardhat@55e303b")
library(tidymodels)
library(spatialsample)
set.seed(7898)
folds <- spatial_clustering_cv(boston_canopy, v = 5)

tree_spec <- decision_tree(cost_complexity = tune(), tree_depth = tune()) %>%
  set_engine("rpart") %>%
  set_mode("regression")
workflow() %>%
  add_model(tree_spec) %>%
  add_formula(mean_heat_index ~ change_canopy_percentage + canopy_percentage_2019 + land_area) %>%
  tune_grid(resamples = folds, grid = 5, metrics = metric_set(rmse))
#> # Tuning results
#> # 5-fold spatial cross-validation 
#> # A tibble: 5 × 4
#>   splits            id    .metrics         .notes          
#>   <list>            <chr> <list>           <list>          
#> 1 <split [490/192]> Fold1 <tibble [5 × 6]> <tibble [0 × 3]>
#> 2 <split [513/169]> Fold2 <tibble [5 × 6]> <tibble [0 × 3]>
#> 3 <split [604/78]>  Fold3 <tibble [5 × 6]> <tibble [0 × 3]>
#> 4 <split [597/85]>  Fold4 <tibble [5 × 6]> <tibble [0 × 3]>
#> 5 <split [524/158]> Fold5 <tibble [5 × 6]> <tibble [0 × 3]>

Created on 2023-03-23 with reprex v2.0.2

pak::pkg_install("tidymodels/hardhat@94cfbc9")
library(tidymodels)
library(spatialsample)
set.seed(7898)
folds <- spatial_clustering_cv(boston_canopy, v = 5)

tree_spec <- decision_tree(cost_complexity = tune(), tree_depth = tune()) %>%
  set_engine("rpart") %>%
  set_mode("regression")
workflow() %>%
  add_model(tree_spec) %>%
  add_formula(mean_heat_index ~ change_canopy_percentage + canopy_percentage_2019 + land_area) %>%
  tune_grid(resamples = folds, grid = 5, metrics = metric_set(rmse))
#> → A | error:   invalid type (list) for variable 'geometry'
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x25
#> 
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.
#> # Tuning results
#> # 5-fold spatial cross-validation 
#> # A tibble: 5 × 4
#>   splits            id    .metrics .notes          
#>   <list>            <chr> <list>   <list>          
#> 1 <split [490/192]> Fold1 <NULL>   <tibble [5 × 3]>
#> 2 <split [513/169]> Fold2 <NULL>   <tibble [5 × 3]>
#> 3 <split [604/78]>  Fold3 <NULL>   <tibble [5 × 3]>
#> 4 <split [597/85]>  Fold4 <NULL>   <tibble [5 × 3]>
#> 5 <split [524/158]> Fold5 <NULL>   <tibble [5 × 3]>
#> 
#> There were issues with some computations:
#> 
#>   - Error(s) x25: invalid type (list) for variable 'geometry'
#> 
#> Run `show_notes(.Last.tune.result)` for more information.

Created on 2023-03-23 with reprex v2.0.2

DavisVaughan commented 1 year ago

I've figured it out, thanks

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.