tidymodels / recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
https://recipes.tidymodels.org
Other
563 stars 111 forks source link

remove highly correlated categorical variables #1045

Open zhaoliang0302 opened 1 year ago

zhaoliang0302 commented 1 year ago

Hi,

step_corr() can remove highly correlated continuous variables using Pearson or Spearman correlation analysis. However, prefilter functions for categorical variables were not provided in the recipes package. I have 20 columns with categorical variables (using one-hot encoding), and I want to remove redundant columns which were correlated with each other. Can you give me some advice? Thanks

Best regards

EmilHvitfeldt commented 1 year ago

Hello @zhaoliang0302, I have been thinking about such steps for a while, do you know of any existing methods that would work to do such an opperation?

corybrunson commented 5 months ago

Hi all, i just came across this issue. I reviewed the JOSS submission for {latentcor}, which is on CRAN and might provide a versatile solution for logical, numeric, and categorical variables.