stevenpawley / recipeselectors

Additional recipes for supervised feature selection to be used with the tidymodels recipes package
https://stevenpawley.github.io/recipeselectors/
Other
55 stars 7 forks source link

step_select_boruta and step_select_mrmr need method for internally handling NAs #11

Open kransom14 opened 1 year ago

kransom14 commented 1 year ago

step_select_boruta and step_select_mrmr cannot handle data with missing/NA values. This requires the user to remove or impute NAs in a recipe step prior to the feature selection step in order to use these feature selection steps, which might not be desirable. It would be handy if step_select_boruta and step_select_mrmr could internally omit NAs which would allow the user to preserve them in the training data.

stevenpawley commented 1 year ago

Is there a reason why adding a step_impute_ step before the filter-based step is undesirable? I guess it you are specifically wanting your model to handle the missing values, e.g., if using XGBoost, then you might not want NAs imputed by another method? However, overall, most steps in the 'recipes' package do not handle NAs, and given the diversity of imputation methods, I'm not sure if just adding approaches on a recipe step basis is the way to go, given the composable style of tidymodels.

You could add a 'missing' column first using step_indicate_na, impute the NAs and them potentially add them back in if you wanted the model at the end of the pipeline to handle them using its own method.

kransom14 commented 1 year ago

Yes, it would be for development of a model capable of handling the NAs, e.g. XGBoost.

From: Steven Pawley @.> Sent: Friday, November 11, 2022 8:31 AM To: stevenpawley/recipeselectors @.> Cc: Ransom, Katherine M @.>; Author @.> Subject: [EXTERNAL] Re: [stevenpawley/recipeselectors] step_select_boruta and step_select_mrmr need method for internally handling NAs (Issue #11)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

Is there a reason why adding a stepimpute step before the filter-based step is undesirable? I guess it you are specifically wanting your model to handle the missing values, e.g., if using XGBoost, then you might not want NAs imputed by another method? However, overall, most steps in the 'recipes' package do not handle NAs, and given the diversity of imputation methods, I'm not sure if just adding approaches on a recipe step basis is the way to go, given the composable style of tidymodels.

You could add a 'missing' column first using step_indicate_na, impute the NAs and them potentially add them back in if you wanted the model at the end of the pipeline to handle them using its own method.

- Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fstevenpawley%2Frecipeselectors%2Fissues%2F11%23issuecomment-1311921965&data=05%7C01%7Ckransom%40usgs.gov%7Ca4c19684765c44cadc9008dac4021b08%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638037810566142777%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pTenIGLn%2FtzmWvLng5pZr8YaRNNHL%2BhnfllYJ4y5Jro%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEKRGZSZKVVJ6KNU5UZP7GDWHZYDXANCNFSM6AAAAAARXTYE4A&data=05%7C01%7Ckransom%40usgs.gov%7Ca4c19684765c44cadc9008dac4021b08%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C638037810566142777%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RfUutcqJtuT0R%2Bwb9gDXSekPoy%2FPtIMfA%2BB6LtugqYE%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>