Closed be-marc closed 5 years ago
# Specify the task
task = mlr_tasks$get("boston_housing")
# Define the learner
learner = mlr_learners$get("regr.rpart")
# Choose resampling strategy
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 5L))
# Specify performance evaluator
pe = PerformanceEvaluator$new(task = task,
learner = learner,
resampling = resampling)
# Specify terminator
tm = TerminatorEvaluations$new(max_evaluations = 10)
# Specify wrapper method
fs = FeatureSelectionRandom$new(pe = pe,
tm = tm,
batch_size = 10,
max_features = 8)
# Run feature selection
fs$calculate()
# Get best selection
fs$get_result()
# Specify the task
task = mlr_tasks$get("pima")
# Change measure
measures = mlr_measures$mget(c("classif.acc"))
task$measures = measures
# Define the learner
learner = mlr_learners$get("classif.rpart")
# Choose resampling strategy
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 5L))
# Specify performance evaluator
pe = PerformanceEvaluator$new(task = task,
learner = learner,
resampling = resampling)
# Specify terminator
tm = TerminatorPerformanceStep$new(threshold = 0.01)
# Specify wrapper method
fs = FeatureSelectionForward$new(pe = pe, tm = tm)
# Run feature selection
fs$calculate()
# Get best selection
fs$get_result()
# Get optimization path
fs$get_optimization_path()
max_features is not implemented for FeatureSelectionForward because it is something the TerminatorPerformanceStep object needs to know. I need to come up with an elegant way to do this. Maybe we have to make max_features an argument for TerminatorPerformanceStep and remove it as a setting for FeatureSelectionForward
Sounds reasonable. I think all kind of termination should go into the terminator. And setting max_features
is def. a termination criterion. I would also put it into TerminatorEvaluations
and document that it is only used for Feature Selection but not for hyperparameter tuning. This way FeatureSelectionForward
can stay "simple" with only pe
and tm
.
We need to come up with an idea how to present the results. At the moment there are just two basic functions get_result and get_optimization_path which print out the best feature combination or the steps of the feature selection.
Do these two incorporate all functionality of mlr::getFeatSelResult()
?
If you mean by "come up with an idea how to present results" possible visualizations, this should go into a separate PR for mlr3viz.
See also ?mlr::plotFilterValues()
.
Can you please use a branch of mlr3featsel for this PR instead of your fork? Makes it easier to checkout the branch and run the code.
Does it make sense to have the helper function binary_to_features, which converts the 0-1 encoding to feature names as a private method in FeatureSelection?
What is the current behavior? 0/1 returns?
Thanks for the good work. Looks really good is a huge contribution.
Please add yourself into the DESCR of the package as an author.
Check the Travis errors
Please always add a "fixes XY" as the first line of the PR so things are cross-linked and get closed automatically
[ ] Tests are needed
[ ] Examples in the functions are needed
Moved to #35
fixes #24
This is a basic implementation of mlr’s
makeFeatSelControlRandom
andmakeFeatSelControlSequential
. The overall design is similar tomlr3tuning
. I used many descriptions and some parts of the code from this package.Classes
FeatureSelection*
generate_states
method that generates different feature combinations (states) in a 0-1 encoding.FeatureSelectionRandom
n combinations are generated depending on thebatch_size
.FeatureSelectionForward
all combinations of one step are generated.PerformanceEvaluator
evaluate_states
method that takes the states as an argument. For each state, the task with all features is cloned and a selection is applied based on the encoding of the state. All states are evaluated withmlr3::benchmark
.evaluate_states
, thestates
are stored in a list entry inself$states
.evaluate_states
, thebenchmark
object is stored in a list entry inself$bmr
.FeatureSelectionForward
is able to generate the path of the stepwise selection.Terminator
Terminator
class inmlr3tuning
.TerminatorPerformanceStep
is specially designed to work withFeatureSelectionForward
. It compares the last two chosen states and terminates if the performance improvement is under a certain threshold.Discussion
get_result
andget_optimization_path
which print out the best feature combination or the steps of the feature selection.binary_to_features
, which converts the 0-1 encoding to feature names as a private method inFeatureSelection
?max_features
is not implemented forFeatureSelectionForward
because it is something theTerminatorPerformanceStep
object needs to know. I need to come up with an elegant way to do this. Maybe we have to makemax_features
an argument forTerminatorPerformanceStep
and remove it as a setting forFeatureSelectionForward
.