tidymodels / parsnip

A tidy unified interface to models
https://parsnip.tidymodels.org
Other
564 stars 78 forks source link

`tunable.model_spec()` includes rows for non-tunable arguments marked with `tune()` #1104

Open simonpcouch opened 3 months ago

simonpcouch commented 3 months ago
library(tidymodels)

tunable(
  linear_reg() %>% 
  set_engine("glmnet", dfmax = tune())
)
#> # A tibble: 3 × 5
#>   name    call_info        source     component  component_id
#>   <chr>   <list>           <chr>      <chr>      <chr>       
#> 1 penalty <named list [2]> model_spec linear_reg main        
#> 2 mixture <named list [3]> model_spec linear_reg main        
#> 3 dfmax   <NULL>           model_spec linear_reg engine

Created on 2024-04-04 with reprex v2.1.0

simonpcouch commented 3 months ago

Okay, a few more notes here.

It seems like the architecture here is:

1) tunable() gets called on a boost_tree() model spec (or other model type) 2) tunable.boost_tree() kicks in 2a) it first calls tunable.model_spec(),

https://github.com/tidymodels/parsnip/blob/eb526fa1092b1c72e280ce5bd78f05392691938c/R/tunable.R#L279-L280

...which only excludes rows that are both main arguments and have no tunable information associated, so there's a bunch of engine arguments with NULL call_info

https://github.com/tidymodels/parsnip/blob/eb526fa1092b1c72e280ce5bd78f05392691938c/R/tunable.R#L43-L46

2b) then, it adds engine-specific parameter information. One way could be add_engine_parameters():

https://github.com/tidymodels/parsnip/blob/eb526fa1092b1c72e280ce5bd78f05392691938c/R/tunable.R#L282-L282

https://github.com/tidymodels/parsnip/blob/eb526fa1092b1c72e280ce5bd78f05392691938c/R/tunable.R#L56-L65

xgboost_engine_args is an inlined tibble in the source:

https://github.com/tidymodels/parsnip/blob/eb526fa1092b1c72e280ce5bd78f05392691938c/R/tunable.R#L91-L106

...the other possible way is to manually insert call_info entries:

https://github.com/tidymodels/parsnip/blob/eb526fa1092b1c72e280ce5bd78f05392691938c/R/tunable.R#L283-L286

Barring a rewrite that stores tunable information in the model environment (see #826), I think our solution might be to filter out rows with NULL call_info at the end of existing non-model_spec tunable() methods.