mlr-org / mlr3filters

Filter-based feature selection for mlr3
https://mlr3filters.mlr-org.com
GNU Lesser General Public License v3.0
20 stars 8 forks source link

Add wrapper methods random and forward feature selection #30

Closed be-marc closed 5 years ago

be-marc commented 5 years ago

fixes #24

This is a basic implementation of mlr’s makeFeatSelControlRandom and makeFeatSelControlSequential. The overall design is similar to mlr3tuning. I used many descriptions and some parts of the code from this package.

Classes

FeatureSelection*

PerformanceEvaluator

Terminator

Discussion

be-marc commented 5 years ago

Example FeatureSelectionRandom + TerminatorEvaluations

# Specify the task
task = mlr_tasks$get("boston_housing")

# Define the learner
learner = mlr_learners$get("regr.rpart")

# Choose resampling strategy
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 5L))

# Specify performance evaluator
pe = PerformanceEvaluator$new(task = task,
                              learner = learner,
                              resampling = resampling)

# Specify terminator
tm = TerminatorEvaluations$new(max_evaluations = 10)

# Specify wrapper method
fs = FeatureSelectionRandom$new(pe = pe,
                                tm = tm,
                                batch_size = 10,
                                max_features = 8)

# Run feature selection
fs$calculate()

# Get best selection
fs$get_result()
be-marc commented 5 years ago

FeatureSelectionForward + TerminatorPerformanceStep

# Specify the task
task = mlr_tasks$get("pima")

# Change measure
measures = mlr_measures$mget(c("classif.acc"))
task$measures = measures

# Define the learner
learner = mlr_learners$get("classif.rpart")

# Choose resampling strategy
resampling = mlr_resamplings$get("cv", param_vals = list(folds = 5L))

# Specify performance evaluator
pe = PerformanceEvaluator$new(task = task,
                              learner = learner,
                              resampling = resampling)

# Specify terminator
tm = TerminatorPerformanceStep$new(threshold = 0.01)

# Specify wrapper method
fs = FeatureSelectionForward$new(pe = pe, tm = tm)

# Run feature selection
fs$calculate()

# Get best selection
fs$get_result()

# Get optimization path
fs$get_optimization_path()
pat-s commented 5 years ago

max_features is not implemented for FeatureSelectionForward because it is something the TerminatorPerformanceStep object needs to know. I need to come up with an elegant way to do this. Maybe we have to make max_features an argument for TerminatorPerformanceStep and remove it as a setting for FeatureSelectionForward

Sounds reasonable. I think all kind of termination should go into the terminator. And setting max_features is def. a termination criterion. I would also put it into TerminatorEvaluations and document that it is only used for Feature Selection but not for hyperparameter tuning. This way FeatureSelectionForward can stay "simple" with only pe and tm.

We need to come up with an idea how to present the results. At the moment there are just two basic functions get_result and get_optimization_path which print out the best feature combination or the steps of the feature selection.

Do these two incorporate all functionality of mlr::getFeatSelResult()? If you mean by "come up with an idea how to present results" possible visualizations, this should go into a separate PR for mlr3viz. See also ?mlr::plotFilterValues().

Misc

be-marc commented 5 years ago

Moved to #35