mlr-org / mlr

Machine Learning in R
https://mlr.mlr-org.com
Other
1.64k stars 404 forks source link

How to use removing constant features EACH TIME just before wrapped variable selection #1810

Closed jyk closed 7 years ago

jyk commented 7 years ago

Hi! I have next issue with mlr: I am using feature selection using filtering method. The problem is that feature selection will fail if there are at least one feature with constant value. I tried to add wrapper "makeRemoveConstantFeaturesWrapper" to learner, but the problem is that it will not be used each time JUST BEFORE feature selection (which has to be used several times - during finding optimal number / percent of features to be retained after feature selection + final feature selection before hyperparameter optimization and final training). Briefly my question is. What can be changed in next function in order to have removal of constant features just before each execution of feature selection ?

My function:

#' Classification with Extreme Gradient Boosting
#'
#' Classification with Extreme Gradient Boosting using hyperparameter tuning and feature selection
#' @param df_tain Data frame for training of model
#' @param df_test Data frame for testing of model
#' @param Y character of response variable. This response variable in data frames 'df_tain' and 'df_test'
#'        has to be binary factor variable with factors "Yes" (event) and "No" (no event))
#' @param cl Class of learner as character. Please see http://mlr-org.github.io/mlr-tutorial/release/html/integrated_learners/
#'        for all possible learners for classification task.
#'        At the moment only 'classif.xgboost' is supported by current function
#' @param eval_metric Evaluation metric for learner
#' @param HypParTuneMethod Method for tuning hyperparameters:
#'        MBO - Model based (bayesian) - default
#'        Iterated_FRacing - Iterated F-Racing
#' @param SurLearnerMBO Surrogate learner for MBO
#' @param NRMaxExpHypTun Maximum number of experiments for hyperparameters tuning
#' @param rdescHypParTune Description of a resampling algorithm for hyperparameters tuning as 'ResampleDesc' object of mlr package
#'                        Please see help for function mlr::makeResampleDesc
#' @param VarSelMethod Variable (feature) selection method.
#'        Next values are possible:
#'        'Filter' if filter method will be used (default);
#'        'Wrapper' if wrapper method will be used (currently not supported)
#'        'None' if no variable selection is needed prior modeling
#'
#' @param VarSelFilterTuneMethod Type for tuning of number of variables in variable selection with filtering method
#'                               if 'fw.perc' - then select VarSelFilterTunePercVal[i] * 100 top scoring features;
#'                               if 'fw.abs' - then select absolute number of top scoring features
#'
#' @param VarSelFilterTunePercVal Sequence of percentages of number of best variables to be chosen
#'                          with variable selection using filtering
#'                          This sequence will be used for tuning the size of the feature subset
#'                          from the top of list of variables calculated by filtering method by criteria 'FilterStat'
#' @param VarSelFilterTuneAbsVal Sequence of absolute numbers of best variables to be chosen
#'                          with variable selection using filtering
#'                          This sequence will be used for tuning the size of the feature subset
#'                          from the top of list of variables calculated by filtering method by criteria 'FilterStat'
#' @param FilterStat Method for calculating the feature importance for variable selection with filter method.
#'                   Please see https://mlr-org.github.io/mlr-tutorial/release/html/filter_methods/index.html
#'                   for available all methods.
#'
#'
#' @param VarSelWrapperLearner Learner for variable selection with wrapper method as character.
#'                             Please see http://mlr-org.github.io/mlr-tutorial/release/html/integrated_learners/
#'                             for all possible learners
#'
#' @param VarSelWrapperFeatSelControl mlr::FeatSelControl object for control structures for feature selection with wrapper method.
#'                                    Please see help for mlr::FeatSelControl function how to set it.
#'
#' @param rdescVarSelTune Description of a resampling algorithm for tuned variable selection as 'ResampleDesc' object of mlr package.
#'                        Please see help for function mlr::makeResampleDesc
#'
#' @return List of next elements:
#'   \itemize{
#'      \item{model - Fitted model with R package 'mlr'}{}
#'      \item{vars_in_model - Variables used of training of model (after variable selection)}{}
#'      \item{conf_mat - Confusion matrix calculated with R package 'caret'}{}
#'      \item{event_prob_test - Predicted probabilities of event}{}
#'      \item{event_label_true_test - True labels of event}{}
#'      \item{event_label_pred_test - Predicted labels of event (cut-off = 0.5)}{}
#'   }
#' @author Jüri Kuusik
#' @encoding UTF-8
#' @note
#' For variable selection two classes of methods are available in this function:
#' \strong{Filter Methods}
#' Filter methods select variables regardless of the model. They put the features in an
#' ordinal list by general features like correlation with the variable to predict or the variance
#' in them. The ranked features then provide a list to make a decision of keeping or
#' removing features based on ranks. Filter methods are often univariate and consider the
#' features independently of other features. The scoring can be done by univariate or with
#' regard to the dependent variable. Some of the best known filter techniques include chi square test, correlation
#' coefficients, and information gain metrics. For example, we know that high variance
#' in the data normally reflects more information in it. In filter methods, we can filter out
#' the features that have low variance and keep the ones with high variance for further analysis.
#' \strong{Wrapper Methods}
#' Wrapper methods consider a set of features to find the best subset of features for a
#' modeling problem. This method treats the features selection process as a search problem,
#' where different combinations of features are tested against performance criteria and
#' compared with other combinations. A predictive model is used to evaluate the different
#' sets of features and an accuracy metric is used to score the set of features. The set of
#' features with the highest accuracy measure is chosen for modeling.
#' The search process may use heuristics like forward selection, backward selection,
#' and so on, or be probabilistic such as random hill-climbing algorithm. Or it may also
#' methodological, like best-fit search or full brute force search. Another advanced example
#' of a wrapper method is the recursive feature elimination algorithm. A simple example can
#' be constructed around forward selection of variable subset; the model starts with a single
#' variable and then starts adding more variables by measuring how much improvement
#' the new variable brings into the model. When addition of a variable doesn’t bring any
#' improvement in the model, we stop. This way, we can search model subset space to find
#' the best subset.
#' @references
#' 1. Machine Learning in R: mlr Tutorial (https://mlr-org.github.io/mlr-tutorial/release/html/index.html)
#' 2. mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions; Bernd Bischl et. al
#' 3. Machine Learning Using R; Karthik Ramasubramanian, Abhishek Singh; 2017
#' @examples
#'  if (!require("AppliedPredictiveModeling")) {
#'    install.packages("AppliedPredictiveModeling", dependencies = TRUE)
#'    library(AppliedPredictiveModeling)
#'  }
#'   if (!require("caret")) {
#'    install.packages("caret", dependencies = TRUE)
#'    library(caret)
#'   }
#'  set.seed(100)
#'  data(GermanCredit)
#'  GermanCredit <- mlr::createDummyFeatures(obj=GermanCredit, target='Class')
#'  # First, remove near-zero variance predictors then get rid of a few predictors
#'  # that duplicate values. For example, there are two possible values for the
#'  # housing variable: "Rent", "Own" and "ForFree". So that we don't have linear
#'  # dependencies, we get rid of one of the levels (e.g. "ForFree")
#'  GermanCredit <- GermanCredit[, -nearZeroVar(GermanCredit)]
#'  GermanCredit$CheckingAccountStatus.lt.0 <- NULL
#'  GermanCredit$SavingsAccountBonds.lt.100 <- NULL
#'  GermanCredit$EmploymentDuration.lt.1 <- NULL
#'  GermanCredit$EmploymentDuration.Unemployed <- NULL
#'  GermanCredit$Personal.Male.Married.Widowed <- NULL
#'  GermanCredit$Property.Unknown <- NULL
#'  GermanCredit$Housing.ForFree <- NULL
#'  # Rename case as "Yes" and non-case as "No"
#'  GermanCredit$Class <- ifelse(GermanCredit$Class == "Bad", "Yes", "No")
#'  GermanCredit$Class <- factor(GermanCredit$Class)
#'  #head(GermanCredit)
#'  inTrain <- createDataPartition(GermanCredit$Class, p = .8)[[1]]
#'  DF.train <- GermanCredit[ inTrain, ]
#'  DF.test  <- GermanCredit[-inTrain, ]
#'  vecNumVars <- names(DF.train)[which(!names(DF.train) %in% c('Class'))]
#'  for (NumVar in vecNumVars){
#'    DF.train[[NumVar]] <- as.numeric(DF.train[[NumVar]])
#'  }
#' # without variable selection
#' m1 <- MODELING_Classification_Extreme_Gradient_Boosting(df_tain=DF.train,
#'                                                        df_test=DF.test,
#'                                                        Y='Class',
#'                                                        VarSelMethod='None'
#' )
#' m1
#' # with Feature selection - filtering - absolute number from top list of best variables
#' m2 <- MODELING_Classification_Extreme_Gradient_Boosting(df_tain=DF.train,
#'                                                        df_test=DF.test,
#'                                                        Y='Class',
#'                                                        VarSelMethod='Filter',
#'                                                        VarSelFilterTuneMethod='fw.abs',
#'                                                        VarSelFilterTuneAbsVal=c(20,32,42)
#' )
#' m2
#' # with Feature selection - wrapper
#' m3 <- MODELING_Classification_Extreme_Gradient_Boosting(df_tain=DF.train,
#'                                                        df_test=DF.test,
#'                                                        Y='Class',
#'                                                        VarSelMethod='Wrapper',
#'                                                        VarSelWrapperFeatSelControl=mlr::makeFeatSelControlSequential(method = "sfs", alpha = 0.0000000000000000000000000001)
#' )
#' m3
#'
#' # with Feature selection - wrapper - error will happened due to too low number of remaining variables
#' m4 <- MODELING_Classification_Extreme_Gradient_Boosting(df_tain=DF.train,
#'                                                         df_test=DF.test,
#'                                                         Y='Class',
#'                                                         VarSelMethod='Wrapper',
#'                                                         VarSelWrapperFeatSelControl=mlr::makeFeatSelControlSequential(method = "sfs", alpha = 0.01)
#' )
#' rm(m1,m2,m3)
#' @keywords Modeling, classification, extreme gradient boosting, hyperparameters tuning, model based optimization, feature selection
#' @export MODELING_Classification_Extreme_Gradient_Boosting
#' @import mlr mlrMBO DiceKriging FSelector xgboost CORElearn
MODELING_Classification_Extreme_Gradient_Boosting <- function(df_tain,
                                                              df_test,
                                                              Y,
                                                              cl="classif.xgboost",
                                                              eval_metric="auc",
                                                              HypParTuneMethod='MBO',
                                                              SurLearnerMBO='regr.km',
                                                              NRMaxExpHypTun=100L,
                                                              rdescHypParTune=mlr::cv3,
                                                              VarSelMethod='Filter',
                                                              VarSelFilterTuneMethod='fw.perc',
                                                              VarSelFilterTunePercVal=seq(0.05, 0.5, 0.05),
                                                              VarSelFilterTuneAbsVal=NULL,
                                                              VarSelFilterStat='information.gain',
                                                              VarSelWrapperLearner='classif.xgboost',
                                                              VarSelWrapperFeatSelControl=mlr::makeFeatSelControlSequential(method = "sfs", alpha = 0.01),
                                                              rdescVarSelTune=mlr::cv3
) {

  MISC_TryCatch_W_E <- function(expr) {
    W <- NULL
    w.handler <- function(w) {
      # warning handler
      W <<- w
      invokeRestart("muffleWarning")
    }
    list(value = withCallingHandlers(tryCatch(expr, error = function(e) e), warning = w.handler), warning = W)
  }

  # ********************************************************************************************************************************************
  # Setting parameters for modeling ************************************************************************************************************
  # ********************************************************************************************************************************************

  library(mlr)
  library(mlrMBO)

  # Setting learner
  lrn <- mlr::makeLearner(cl = cl,
                          eval_metric = eval_metric,
                          predict.type = "prob"
  )

  # Add a wrapper for removing constant features from training data
  lrn <- mlr::makeRemoveConstantFeaturesWrapper(learner = lrn, dont.rm = Y)

  # Setting classification tasks
  # Specify the type of analysis (e.g. classification), provide data, response variable and set positive cases for classification

  # Task for hyperparamaters' optimization and model training
  task_train <- mlr::makeClassifTask(data = df_tain,
                                     target = Y,
                                     positive = "Yes"
  )

  # Task for testing
  task_test <- mlr::makeClassifTask(data = df_test,
                                    target = Y,
                                    positive = "Yes"
  )

  # ********************************************************************************************************************************************
  # Variable selection  ************************************************************************************************************************
  # ********************************************************************************************************************************************

  # Filter method *******************************************************************
  if (VarSelMethod == 'Filter'){

    # Adding filtering class to learner
    if (VarSelFilterStat=='information.gain'){

      # We want to use the implementation of info gain in CORElearn, not Weka:
      infGain = mlr::makeFilter(
        name = "InfGain",
        desc = "Information gain ",
        pkg  = "CORElearn",
        supported.tasks = c("classif", "regr"),
        supported.features = c("numerics", "factors"),
        fun = function(task, nselect, ...) {
          CORElearn::attrEval(
            getTaskFormula(task),
            data = getTaskData(task),
            estimator = "InfGain", ...
          )
        }
      )

      lrn <-  mlr::makeFilterWrapper(learner = cl, fw.method = "InfGain")

    } else {

      lrn <- mlr::makeFilterWrapper(learner = cl, fw.method = VarSelFilterStat)

    }

    # Tune the number of variables used for picking best variables after variable selection

    if (VarSelFilterTuneMethod == 'fw.perc'){
      ps <- ParamHelpers::makeParamSet(ParamHelpers::makeDiscreteParam(VarSelFilterTuneMethod, values = VarSelFilterTunePercVal))
    } else if (VarSelFilterTuneMethod == 'fw.abs'){
      ps <- ParamHelpers::makeParamSet(ParamHelpers::makeDiscreteParam(VarSelFilterTuneMethod, values = VarSelFilterTuneAbsVal))
    }

    res <- mlr::tuneParams(lrn,
                           task = task_train,
                           resampling = rdescVarSelTune,
                           par.set = ps,
                           control = makeTuneControlGrid(),
                           show.info = FALSE
    )

    # Generate a new wrapped learner with the optimal percentage value / absolute number
    if (VarSelFilterTuneMethod == 'fw.perc'){
      lrn <- mlr::makeFilterWrapper(learner = cl, fw.method = VarSelFilterStat, fw.perc = res$x$fw.perc)
    } else if (VarSelFilterTuneMethod == 'fw.abs'){
      lrn <- mlr::makeFilterWrapper(learner = cl, fw.method = VarSelFilterStat, fw.abs = res$x$fw.abs)
    }

    # Wrapper method *******************************************************************
  } else if (VarSelMethod == 'Wrapper'){

    # Select a feature subset
    sfeats <- mlr::selectFeatures(learner = VarSelWrapperLearner,
                                  task = task_train,
                                  resampling = rdescVarSelTune,
                                  control = VarSelWrapperFeatSelControl,
                                  show.info = FALSE
    )

    # Redefefine task using only selected variables
    task_train <- mlr::makeClassifTask(data = df_tain[,c(Y,sfeats$x)],
                                       target = Y,
                                       positive = "Yes"
    )

  }

  # ********************************************************************************************************************************************
  # Hyperparameter optimization  ***************************************************************************************************************
  # ********************************************************************************************************************************************

  # Setting hyperparameters' space
  ps <- ParamHelpers::makeParamSet(
    makeIntegerParam(id = "nrounds", lower = 200, upper = 2500, default = 200),
    makeNumericParam(id = "eta", lower = -7, upper = -5, default = -6, trafo = function(x) 2^x),
    makeIntegerParam(id = "max_depth", lower = 3, upper = 15, default = 3),
    makeNumericParam(id = "colsample_bytree", lower = 0.3, upper = 1, default = 0.6),
    makeNumericParam(id = "subsample", lower = 0.3, upper = 1, default = 0.6)
  )

  # Setting control for tuning

  if (HypParTuneMethod=='MBO'){

    # Setting control object for MBO optimization
    mbo.ctrl <- mlrMBO::makeMBOControl()

    # Extends an MBO control object with infill criteria and infill optimizer options
    # mbo.ctrl <- setMBOControlTermination(mbo.ctrl, iters = 100)

    # Defining surrogate learner for MBO
    surrogate.lrn <- mlr::makeLearner(SurLearnerMBO, predict.type = "se")

    # Create control object for hyperparameter tuning with MBO
    ctrl <- mlr::makeTuneControlMBO(learner = surrogate.lrn, mbo.control = mbo.ctrl)

  } else if (HypParTuneMethod=='Iterated_FRacing'){

    ctrl = mlr::makeTuneControlIrace(maxExperiments = NRMaxExpHypTun)

  }

  # Tuning hyperparameters
  parallelMap::parallelStartSocket()

  # Set prediction type to prob
  lrn <- mlr::setPredictType(lrn, predict.type = "prob")

  TuneRes <- MISC_TryCatch_W_E(mlr::tuneParams(learner = lrn,
                                                         task = task_train,
                                                         resampling = rdescHypParTune,
                                                         par.set = ps,
                                                         control = ctrl,
                                                         show.info = FALSE,
                                                         measures = mlr::auc
  )
  )$value

  if (is.null(TuneRes$message)) { # Not an error message
    parallelMap::parallelStop()
    # Set found optimal hyperparameters for learner
    lrn <- mlr::setHyperPars(lrn, par.vals = TuneRes$x)
  } else {
    parallelMap::parallelStop()
    stop(paste0('Error during hyperparameters tuning: ',TuneRes$message))
  }

  # ********************************************************************************************************************************************
  # Model training and evaluation **************************************************************************************************************
  # ********************************************************************************************************************************************

  # Train model with optimal hyperparameters on train data set
  my_model <- mlr::train(learner=lrn, task=task_train)

  # Predict on test data set
  my_prediction <- predict(my_model, task = task_test)

  # Probabilities of event
  pos_probs <- mlr::getPredictionProbabilities(my_prediction)

  # True labels of event
  pos_true <- mlr::getPredictionTruth(my_prediction)

  # Predicted labels of event (cut-off = 0.5)
  pos_predict_label <- mlr::getPredictionResponse(my_prediction)

  # confusion matrix
  cm <- caret::confusionMatrix(data=pos_predict_label,
                               reference=pos_true,
                               positive='Yes'
  )

  # Get variables in the model
  if (VarSelMethod == 'Filter'){
    my_features <- mlr::getFilteredFeatures(my_model)
  } else if (VarSelMethod == 'Wrapper' | VarSelMethod == 'None'){
    my_features <- my_model$features
  }

  # Returning function value  *********************************

  return(list(model = my_model,
              vars_in_model = my_features,
              conf_mat = cm,
              event_prob_test = pos_probs,
              event_label_true_test = pos_true,
              event_label_pred_test = pos_predict_label
  )
  )

}
mb706 commented 7 years ago

This is quite some function!

I didn't really go through it, but you should try to wrap a learner with the filter wrapper and then add the RemoveConstantFeaturesWrapper. Your code looks like it is trying to do that in the wrong order. The 'outermost' wrapper does its job first.

Also, on first glance it looks like you are doing

lrn <- mlr::makeRemoveConstantFeaturesWrapper(learner = lrn, dont.rm = Y)

but then never using that lrn again and instead create a wrapper around a new learner (created from cl)

lrn <-  mlr::makeFilterWrapper(learner = cl, fw.method = "InfGain")

So not only is it the wrong order, the makeRemoveConstantFeaturesWrapper really seems to have no effect on the code at all.

Hope I could help!

jyk commented 7 years ago

Thanks! I will try. Just one more comment about wrappers. Let us say that I have 3 different wrappers for preprocessing. Firstly, I would like to add some derivates into my training and validation data by predicting (wrapper # 1), then I would like to remove constant features (wrapper # 2) and then do feature selection(wrapper #3). Then do I have to execute them in the next order: wrapper 3, then wrapper 2 and then wrapper 1 before training/hyperparameters tuning ?

jakob-r commented 7 years ago

Precisely spoken, you do not execute the wrapper, you just apply them to a learner, layer per layer. Inside is the learner. Before the final learner is called you want mlr to execute wrapper # 3 so you have to wrap wrapper # 3 around the learner. Then you wrap that with wrapper # 2 and so on. So yes you are right with the mentioned order.

jyk commented 7 years ago

Thanks!

jyk commented 7 years ago

By the way, i did not found this explanation in mlr online documentation (I mean the order of setting layers of wrappers). Sorry, if I am wrong...

jakob-r commented 7 years ago

I opened an issue there. I will close here. If something is unclear you can reopen.

jyk commented 7 years ago

Ok, many thanks!