Closed jamesgrecian closed 1 year ago
I'm personally not super familiar with mgcv, so I asked around for help on this one, and @simonpcouch was awesome enough to explain what's going on here for me :smile: That also means I can't give a ton of other details here myself (though I can ask around and try to find out!) as I don't know the internals of what's happening here.
But, basically: when using mgcv you need to provide two formula arguments. The first one in add_model()
is the formula that mgcv will see, and that's where you define your smooths. The second formula goes in add_formula()
, and that's what functions from recipes (and other pre-processors) will see. Because pre-processors don't understand what to do with objects produced by s()
, that formula needs to not include s()
.
So, to make this work, try the following:
library(tidymodels)
library(spatialsample)
gam_spec <- gen_additive_mod(mode = "regression")
workflow() |>
add_model(
gam_spec,
formula = canopy_area_2019 ~ s(land_area) + s(mean_temp)
) |>
add_formula(
canopy_area_2019 ~ land_area + mean_temp
) |>
fit_resamples(spatial_clustering_cv(boston_canopy))
#> # Resampling results
#> # 10-fold spatial cross-validation
#> # A tibble: 10 × 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [589/93]> Fold01 <tibble [2 × 4]> <tibble [0 × 3]>
#> 2 <split [645/37]> Fold02 <tibble [2 × 4]> <tibble [0 × 3]>
#> 3 <split [637/45]> Fold03 <tibble [2 × 4]> <tibble [0 × 3]>
#> 4 <split [613/69]> Fold04 <tibble [2 × 4]> <tibble [0 × 3]>
#> 5 <split [598/84]> Fold05 <tibble [2 × 4]> <tibble [0 × 3]>
#> 6 <split [614/68]> Fold06 <tibble [2 × 4]> <tibble [0 × 3]>
#> 7 <split [587/95]> Fold07 <tibble [2 × 4]> <tibble [0 × 3]>
#> 8 <split [615/67]> Fold08 <tibble [2 × 4]> <tibble [0 × 3]>
#> 9 <split [622/60]> Fold09 <tibble [2 × 4]> <tibble [0 × 3]>
#> 10 <split [618/64]> Fold10 <tibble [2 × 4]> <tibble [0 × 3]>
Created on 2023-05-03 with reprex v2.0.2
This is wonky and the team knows it's wonky -- there's a related issue on this over at https://github.com/tidymodels/parsnip/issues/770 , and plans to work on better docs & error handling for mgcv models in the future.
Let me know if that helps :smile:
Thanks @mikemahoney218
I was getting tripped up with my own dataset as I was converting the response to a factor on the fly which tidy models doesn't seem to like - something along the lines of this:
workflow() |>
add_model(
gam_spec,
formula = factor(canopy_area_2019) ~ s(land_area) + s(mean_temp)
) |>
add_formula(
factor(canopy_area_2019) ~ land_area + mean_temp
) |>
fit_resamples(spatial_clustering_cv(boston_canopy))
If I go back and reformat the response column, then regenerate the spatialsamples::spatial_block_cv
object, this runs fine and returns the expected output.
Thanks again,
James
Awesome, glad to see it works :smile:
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Hi @mikemahoney218,
I've been using the
spatialsample
package to perform spatial cross-validation on an ensemble of species distribution models. However, I've got stuck when trying to fit a GAM usingmgcv::gam
.It looks like the tidymodel workflow set up for fitting GAMs is a little different to linear models, RFs of BRTs. So I don't know if this is an issue with passing spatialsample objects to GAMs or an issue with how I've formulated the gam workflow...?
Here's an example attempt to fit a gam to a spatialsample cross validation object using tidy models. You'll see the linear regression model workflow works, but the only way I can get a gam to work is to use
lapply
andfit
.Thanks,
James
Created on 2023-05-03 with reprex v2.0.2