Add wrapper methods random and forward feature selection

be-marc commented 5 years ago

fixes #24

This is a basic implementation of mlr’s makeFeatSelControlRandom and makeFeatSelControlSequential. The overall design is similar to mlr3tuning. I used many descriptions and some parts of the code from this package.

Classes

FeatureSelection*

Implements the generate_states method that generates different feature combinations (states) in a 0-1 encoding.
- For FeatureSelectionRandom n combinations are generated depending on the batch_size.
- For FeatureSelectionForward all combinations of one step are generated.

PerformanceEvaluator

Implements the evaluate_states method that takes the states as an argument. For each state, the task with all features is cloned and a selection is applied based on the encoding of the state. All states are evaluated with mlr3::benchmark.
For each call of evaluate_states, the states are stored in a list entry in self$states.
For each call of evaluate_states, the benchmark object is stored in a list entry in self$bmr.
The storing in list entries is necessary so that FeatureSelectionForward is able to generate the path of the stepwise selection.

Terminator

Works similar to the Terminator class in mlr3tuning.
TerminatorPerformanceStep is specially designed to work with FeatureSelectionForward. It compares the last two chosen states and terminates if the performance improvement is under a certain threshold.

Discussion

We need to come up with an idea how to present the results. At the moment there are just two basic functions get_result and get_optimization_path which print out the best feature combination or the steps of the feature selection.
Does it make sense to have the helper function binary_to_features, which converts the 0-1 encoding to feature names as a private method in FeatureSelection?
max_features is not implemented for FeatureSelectionForward because it is something the TerminatorPerformanceStep object needs to know. I need to come up with an elegant way to do this. Maybe we have to make max_features an argument for TerminatorPerformanceStep and remove it as a setting for FeatureSelectionForward.

be-marc commented 5 years ago

Example FeatureSelectionRandom + TerminatorEvaluations

# Specify the task
task = mlr_tasks$get("boston_housing")

# Define the learner
learner = mlr_learners$get("regr.rpart")

# Choose resampling strategy
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 5L))

# Specify performance evaluator
pe = PerformanceEvaluator$new(task = task,
                              learner = learner,
                              resampling = resampling)

# Specify terminator
tm = TerminatorEvaluations$new(max_evaluations = 10)

# Specify wrapper method
fs = FeatureSelectionRandom$new(pe = pe,
                                tm = tm,
                                batch_size = 10,
                                max_features = 8)

# Run feature selection
fs$calculate()

# Get best selection
fs$get_result()

be-marc commented 5 years ago

FeatureSelectionForward + TerminatorPerformanceStep

# Specify the task
task = mlr_tasks$get("pima")

# Change measure
measures = mlr_measures$mget(c("classif.acc"))
task$measures = measures

# Define the learner
learner = mlr_learners$get("classif.rpart")

# Choose resampling strategy
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 5L))

# Specify performance evaluator
pe = PerformanceEvaluator$new(task = task,
                              learner = learner,
                              resampling = resampling)

# Specify terminator
tm = TerminatorPerformanceStep$new(threshold = 0.01)

# Specify wrapper method
fs = FeatureSelectionForward$new(pe = pe, tm = tm)

# Run feature selection
fs$calculate()

# Get best selection
fs$get_result()

# Get optimization path
fs$get_optimization_path()

pat-s commented 5 years ago

max_features is not implemented for FeatureSelectionForward because it is something the TerminatorPerformanceStep object needs to know. I need to come up with an elegant way to do this. Maybe we have to make max_features an argument for TerminatorPerformanceStep and remove it as a setting for FeatureSelectionForward

Sounds reasonable. I think all kind of termination should go into the terminator. And setting max_features is def. a termination criterion. I would also put it into TerminatorEvaluations and document that it is only used for Feature Selection but not for hyperparameter tuning. This way FeatureSelectionForward can stay "simple" with only pe and tm.

We need to come up with an idea how to present the results. At the moment there are just two basic functions get_result and get_optimization_path which print out the best feature combination or the steps of the feature selection.

Do these two incorporate all functionality of mlr::getFeatSelResult()? If you mean by "come up with an idea how to present results" possible visualizations, this should go into a separate PR for mlr3viz. See also ?mlr::plotFilterValues().

Misc

Can you please use a branch of mlr3featsel for this PR instead of your fork? Makes it easier to checkout the branch and run the code.
Does it make sense to have the helper function binary_to_features, which converts the 0-1 encoding to feature names as a private method in FeatureSelection?

What is the current behavior? 0/1 returns?
Thanks for the good work. Looks really good is a huge contribution.
Please add yourself into the DESCR of the package as an author.
Check the Travis errors
Please always add a "fixes XY" as the first line of the PR so things are cross-linked and get closed automatically
[ ] Tests are needed
[ ] Examples in the functions are needed

be-marc commented 5 years ago

Moved to #35

mlr-org / mlr3filters