stevenpawley / recipeselectors

Additional recipes for supervised feature selection to be used with the tidymodels recipes package
https://stevenpawley.github.io/recipeselectors/
Other
55 stars 7 forks source link

Recipe step for dealing with highly correlated values for ensemble stacked models #3

Open kamaulindhardt opened 3 years ago

kamaulindhardt commented 3 years ago

Dear Steven Pawley & Max Kuhn, and other enthusiasts,

Regarding my request for help on RStudio Community https://community.rstudio.com/t/does-themis-package-feature-functions-for-dealing-with-continuous-data-imbalance/110432

I am in need of a solution to solve my issue of my poor performing ensemble stacked model, that I suspect to be related to some kind of feature pre-processing steps. Your package recipeselectors seems promessing.

I already perform these preprocessing steps

 step_impute_mode(Product) %>% 
  step_novel(Site_Type, Tree, -all_outcomes()) %>% 
  step_dummy(Site_Type, Tree, one_hot = TRUE, naming = partial(dummy_names,sep = "_")) %>% 
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_corr(all_numeric(), -all_outcomes()) %>% 
  step_lincomb(all_numeric(), -all_outcomes()) %>% 
  step_normalize(all_numeric(), -all_outcomes()) %>%
  step_impute_mode(all_nominal(), -all_outcomes()) %>%
  step_impute_knn(logRR) 

As you see on the model evaluation graphs something goes wrong in the modelling. For some reason, my model performs exceptionally poor, especially around the centre.

Here is a snapshot of my ensemble stacked model output.

image

image