mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
947 stars 85 forks source link

Error in PipeOp high cardinality factor encoding using mlr3: Invalid 'col_roles' Names #1210

Closed marineReg closed 1 week ago

marineReg commented 1 week ago

Hello, I posted my issue on StackOverflow. Following the comment, I updated the packages, and also restarted RStudio and my PC, but I’m still getting the error message. https://stackoverflow.com/questions/79186614/error-in-pipeop-high-cardinality-factor-encoding-using-mlr3-invalid-col-roles Thank you very much for your assistance.

be-marc commented 1 week ago

Hey,

whats happening if you run this code on your machine?

# use clean environment
renv::init(bare = TRUE)
renv::install(c("mlr3verse", "mlr3spatiotempcv"))

library(mlr3verse)
library(mlr3spatiotempcv)

classif_task_sp = tsk("ecuador")
classif_task_sp$set_col_roles("slides", roles = c("target", "stratum"))
partition_classif_task_sp = mlr3::partition(classif_task_sp, ratio = 0.67)

factor_encoding = mlr3pipelines::po("removeconstants") %>>%
  ## mlr3pipelines::po("collapsefactors", no_collapse_above_prevalence = 0.01) %>>%
  mlr3pipelines::po("encodeimpact", affect_columns = selector_cardinality_greater_than(10), id = "high_cardinality_encoding") %>>%
  mlr3pipelines::po("encode", method = "one-hot", affect_columns = selector_cardinality_greater_than(3), id = "low_cardinality_encoding") %>>%
  mlr3pipelines::po("encode", method = "treatment", affect_columns = selector_type("factor"), id = "binary_encoding")

learner_glmnet = mlr3tuningspaces::lts(mlr3::lrn("classif.glmnet", predict_type = "prob", standardize = FALSE))
learner_glmnet_factor_encoding = mlr3::as_learner(factor_encoding %>>% learner_glmnet)

tuning = mlr3tuning::auto_tuner(tuner = mlr3tuning::tnr("grid_search", resolution = 5, batch_size = 10),
                                 learner = learner_glmnet_factor_encoding,
                                 resampling = mlr3::rsmp("spcv_coords", folds = 2),
                                 measure = mlr3::msr("classif.prauc"),
                                 terminator = mlr3tuning::trm("evals", n_evals = 2, k = 0))

run_resampling = mlr3::resample(classif_task_sp, learner = tuning, resampling = mlr3::rsmp("spcv_coords", folds = 2), store_models = TRUE)

run_training = tuning$train(classif_task_sp, row_ids = partition_classif_task_sp$train)
marineReg commented 1 week ago

Great! I tested your code as well as mine, and it works! A big thank you for your help !

be-marc commented 1 week ago

Welcome. There is something wrong with your R library. Maybe deleting all packages will help. Then you don't need renv.