tidymodels / parsnip

A tidy unified interface to models
https://parsnip.tidymodels.org
Other
584 stars 88 forks source link

mars fit fails if prune_method='cv' and prod_degree is specified #432

Open smingerson opened 3 years ago

smingerson commented 3 years ago

Problem

The prod_degree argument cannot be specified if prune_method="cv". Using translate(), the equivalent earth::earth() call runs, so I think this is a bug in parsnip. The parsnip model does fit if I use other pruning methods, excepting "exhaustive". I did not include that here because the corresponding earth::earth() call failed, and is beyond my ability to diagnose.

Using dev parsnip and 5.3.0 earth

library(parsnip)
library(earth)
#> Loading required package: Formula
#> Loading required package: plotmo
#> Loading required package: plotrix
#> Loading required package: TeachingDemos
mod <- mars(
  mode = "regression",
  prune_method = "cv"
) %>%
  set_engine("earth",
             nfold = 5, ncross = 2
  )
translate(mod)
#> MARS Model Specification (regression)
#> 
#> Main Arguments:
#>   prune_method = cv
#> 
#> Engine-Specific Arguments:
#>   nfold = 5
#>   ncross = 2
#> 
#> Computational engine: earth 
#> 
#> Model fit template:
#> earth::earth(formula = missing_arg(), data = missing_arg(), weights = missing_arg(), 
#>     pmethod = "cv", nfold = 5, ncross = 2, keepxy = TRUE)
set.seed(700)
N <- 1000
dt <- data.frame(x1 = rnorm(N, 5, 3), x2 = rnorm(N, 2, 1))
dt$y <- dt$x1 + dt$x2 + rnorm(N, sd = .5)
# Works
fit(mod, y~., data = dt)
#> parsnip model object
#> 
#> Fit time:  261ms 
#> Selected 5 of 5 terms, and 2 of 2 predictors (pmethod="cv")
#> Termination condition: RSq changed by less than 0.001 at 5 terms
#> Importance: x1, x2
#> Number of terms at each degree of interaction: 1 4 (additive model)
#> GRSq 0.9746102  RSq 0.9750152  mean.oof.RSq 0.9743736 (sd 0.00269)
#> 
#> pmethod="backward" would have selected the same model:
#>     5 terms 2 preds,  GRSq 0.9746102  RSq 0.9750152  mean.oof.RSq 0.9743736
# fails
mod<- update(mod,prod_degree = 2)
fit(mod, y~., data=dt)
#> Error: 'degree' must be numeric, or TRUE or FALSE (whereas its current class is "quosure,formula")
#> Timing stopped at: 0.23 0 0.23
translate(mod)
#> MARS Model Specification (regression)
#> 
#> Main Arguments:
#>   prod_degree = 2
#>   prune_method = cv
#> 
#> Engine-Specific Arguments:
#>   nfold = 5
#>   ncross = 2
#> 
#> Computational engine: earth 
#> 
#> Model fit template:
#> earth::earth(formula = missing_arg(), data = missing_arg(), weights = missing_arg(), 
#>     degree = 2, pmethod = "cv", nfold = 5, ncross = 2, keepxy = TRUE)
# equivalent earth call succeeds
earth::earth(y~., data = dt, pmethod = 'cv', degree = 2, nfold = 5, ncross = 2, keepxy = TRUE)
#> Selected 5 of 5 terms, and 2 of 2 predictors (pmethod="cv")
#> Termination condition: RSq changed by less than 0.001 at 5 terms
#> Importance: x1, x2
#> Number of terms at each degree of interaction: 1 4 (additive model)
#> GRSq 0.9745074  RSq 0.9750152  mean.oof.RSq 0.9740412 (sd 0.00346)
#> 
#> pmethod="backward" would have selected the same model:
#>     5 terms 2 preds,  GRSq 0.9745074  RSq 0.9750152  mean.oof.RSq 0.9740412

Created on 2021-02-16 by the reprex package (v1.0.0)

juliasilge commented 3 years ago

I looked at this a little bit today; it seems like a weird one to me.

dchiu911 commented 11 months ago

Any updates for this bug?

Either I don't tune "cv" in prune_method

mars_model <- list(
  mars = mars(
    mode = "classification",
    engine = "earth",
    num_terms = tune(),
    prod_degree = tune(),
    prune_method = tune()
  )
)
→ A | error:   the nfold argument must be specified when pmethod="cv"

or I try to tune "cv" and it breaks the other methods

mars_model <- list(
  mars = mars(
    mode = "classification",
    num_terms = tune(),
    prod_degree = tune(),
    prune_method = tune()
  ) %>% 
    set_engine("earth", nfold = 5)
)
→ A | error:   'degree' must be numeric, or TRUE or FALSE (whereas its current class is "quosure,formula")