stevenpawley / colino

Recipes Steps for Supervised Filter-Based Feature Selection
https://stevenpawley.github.io/colino/
Other
35 stars 5 forks source link

Error step_select_infgain when tuning #1

Closed AlbertoAlmuinha closed 1 year ago

AlbertoAlmuinha commented 1 year ago

Hi,

When I run the recipe I get the result without problem:

library(tidymodels)
library(workflowsets)
library(tidyverse)
library(modeltime)
library(modeltime.resample)
library(timetk)
library(colino)
library(lubridate).

base_recipe <- 
            recipe(pd ~ ., data = data_prepared_tbl) %>%
            step_select_infgain(all_numeric_predictors(), outcome = "pd", top_p = 3, type = "symuncert") %>%
            step_corr(all_numeric_predictors(), threshold = 0.9, method = "pearson")

base_recipe %>% prep() %>% juice()

This code runs ok.

The problem comes when I use tune and workflowsets to tune the parameters of a model and immediately get an error:

arima_spec <- 
  arima_reg(seasonal_period = 4, 
            non_seasonal_differences = tune(), 
            non_seasonal_ar = tune(), 
            non_seasonal_ma = tune()) %>%
  set_engine("arima")

cmb_models <- 
  workflow_set(
    preproc = list(base = base_recipe),
    models = list(arima = arima_spec),
    cross = TRUE
  )

set.seed(123)
cmb_models <- 
  cmb_models %>% 
  workflow_map("tune_grid", 
               resamples = resamples_tscv, 
               grid = tidyr::crossing(non_seasonal_ar = 0:4, 
                                      non_seasonal_differences = 0:1, 
                                      non_seasonal_ma = 0:4), 
               metrics = default_forecast_accuracy_metric_set(), 
               verbose = TRUE)

Error:
! Tibble columns must have compatible sizes.
* Size 2: Existing data.
* Size 3: Column `call_info`.
i Only values of size one are recycled.
Run `rlang::last_error()` to see where the error occurred.
Execution stopped; returning current results

If I comment out the step_select_infgain step and replace it with a step_select where I select the variables that the other step would have given me, the process works without any problem.

I get the feeling that there is something weird with step_select_infgain...

Sorry for not sharing a reprex, the data I was using is private.

endp01 commented 1 year ago

I encountered the same bug in step_select_vip and step_select_forest, and I might have found the reason it occurs.

If you look at tunable.step_select_infgain: https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_infgain.R#L243 The problem is the tibble creation - we need to add "cutoff" to this line. As in: name = c("top_p", "threshold", "cutoff"),

Like I said, I had the same issue with multiple other step select functions that I wanted to use in workflows and I could fix it this way.

Those should be the specific issues ("cutoff" is missing in all of them):

https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_aov.R#L207 https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_carscore.R#L234 https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_forests.R#L248 https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_infgain.R#L243 https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_linear.R#L260 https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_mrmr.R#L217 https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_relief.R#L247 https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_roc.R#L211 https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_tree.R#L248 https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_vip.R#L219 https://github.com/stevenpawley/colino/blob/6738db1ede81b5c6fa2e65bb1c5aef1c6456afab/R/step_select_xtab.R#L212

balraadjsings commented 1 year ago

Any updates on this? I've added the "cutoff" to the function through trace(colino:::tunable.step_select_mrmr, edit=T) but it still returns the same tibble error

endp01 commented 1 year ago

Any updates on this? I've added the "cutoff" to the function through trace(colino:::tunable.step_select_mrmr, edit=T) but it still returns the same tibble error

The way I changed the package was to download the whole git repository, change the lines in question and install the custom package with the new code with devtools. See here for a tutorial on how to install packages with devtools: https://kbroman.org/pkg_primer/pages/build.html

balraadjsings commented 1 year ago

Any updates on this? I've added the "cutoff" to the function through trace(colino:::tunable.step_select_mrmr, edit=T) but it still returns the same tibble error

The way I changed the package was to download the whole git repository, change the lines in question and install the custom package with the new code with devtools. See here for a tutorial on how to install packages with devtools: https://kbroman.org/pkg_primer/pages/build.html

Perfect, it works now! That seemed to do the trick, thanks :)

stevenpawley commented 1 year ago

Should be fixed by c2d46d0