stevenpawley / colino

Recipes Steps for Supervised Filter-Based Feature Selection
https://stevenpawley.github.io/colino/
Other
37 stars 5 forks source link

`top_p` should be on (0, 3). #2

Open Steviey opened 1 year ago

Steviey commented 1 year ago

According to the example from Max, at the last conference in D.C. I try to tune top_p of step_select_vip(), (which by the way works)...I get warnings...


model_spec    <- linear_reg(penalty = NULL, mixture = NULL) %>% set_engine("lm")

      set.seed(1234)

      recipe_spec <- recipe(myFormula, data = df_train) %>%
          #step_normalize(all_predictors()) %>%
          colino::step_select_vip(all_predictors(),model=model_spec,outcome = "y",top_p=tune())

      folds <- vfold_cv(df_train,repeats=1,v=5,strata=y) # 

      model_wfl <- workflow() %>%
          add_model(model_spec) %>%
          add_recipe(recipe_spec) %>%
          tune_grid(resamples=folds,grid=25)

I get the warning... 

! Fold1: preprocessor 3/4: `top_p` should be on (0, 3).
! Fold1: preprocessor 4/4: `top_p` should be on (0, 3).
! Fold2: preprocessor 3/4: `top_p` should be on (0, 3).
! Fold2: preprocessor 4/4: `top_p` should be on (0, 3).
! Fold3: preprocessor 3/4: `top_p` should be on (0, 3).
! Fold3: preprocessor 4/4: `top_p` should be on (0, 3).
! Fold4: preprocessor 3/4: `top_p` should be on (0, 3).
! Fold4: preprocessor 4/4: `top_p` should be on (0, 3).
! Fold5: preprocessor 3/4: `top_p` should be on (0, 3).
! Fold5: preprocessor 4/4: `top_p` should be on (0, 3).
# Tuning results
# 5-fold cross-validation using stratification 
# A tibble: 5 × 4
  splits         id    .metrics         .notes          
  <list>         <chr> <list>           <list>          
1 <split [9/3]>  Fold1 <tibble [8 × 5]> <tibble [2 × 3]>
2 <split [9/3]>  Fold2 <tibble [8 × 5]> <tibble [2 × 3]>
3 <split [10/2]> Fold3 <tibble [8 × 5]> <tibble [2 × 3]>
4 <split [10/2]> Fold4 <tibble [8 × 5]> <tibble [2 × 3]>
5 <split [10/2]> Fold5 <tibble [8 × 5]> <tibble [2 × 3]>

There were issues with some computations:
  - Warning(s) x10: `top_p` should be on (0, 3).

What does this mean?

stevenpawley commented 1 year ago

Hello just a quick note that I'll try to look into this, once I get some time later this week

stevenpawley commented 1 year ago

Sorry for the hiatus but this appears to be related to the tuning grid which is attempting to select more features than are available in your dataset. In this case, if top_n > n_features, the filter steps will set top_n = n_features and will issue a warning.