Feature importance calculation

mlr-org / mlr3filters

Filter-based feature selection for mlr3

https://mlr3filters.mlr-org.com

GNU Lesser General Public License v3.0

20 stars 8 forks source link

Feature importance calculation #33

Closed larskotthoff closed 5 years ago

larskotthoff commented 5 years ago

The name for the function to calculate a ranking for feature importance is calculate() -- this is very non-descriptive and non-intuitive. I propose to rename to ranking().

pat-s commented 5 years ago

(I moved this issue to mlr3featsel since mlr3 does not feature selection per se)

Currently this is only used for Filters and basically means "calculate the filter values".

We are thinking about making .$calculate() internal and call it during construction, i.e. filter$new(). So that .$calculate() or whatever it will be called becomes a private method.

I do not like rank() so much, since this implies that it does "only" some ranking whereas in fact the generation/calculation of filter values is happening and the ranking part is only a subset of what is actually happening.

In fact, we are not even sure if .$calculate() should do a ranking or just return the values in a shuffled order and the ranking is applied somewhere else.

For "wrapper methods" also referred to "feature selection" often, we could maybe use a function called .$run() or similar? This would be more generic?

larskotthoff commented 5 years ago

I like the idea of making it internal and calling it automatically, but then the name of the function to get the values should arguably be get().

Is there any scenario where it makes sense to get the values not in rank order? If not, I don't see why there should be a separate function to do that.

pat-s commented 5 years ago

I like the idea of making it internal and calling it automatically, but then the name of the function to get the values should arguably be get().

Values are simply stored in filter$scores, so there is no getter needed.

Is there any scenario where it makes sense to get the values not in rank order? If not, I don't see why there should be a separate function to do that.

I do not know of any right now. Michel coded it like that explicitly for some reason, maybe because this is also the behavior of mlr? generateFilterValuesData() returns the values shuffled, only filterFeatures() is doing the ranking.

pat-s commented 5 years ago

@larskotthoff see https://github.com/mlr-org/mlr3featsel/issues/28#issuecomment-503750193

pat-s commented 5 years ago

The functionality has been trimmed down to the following:

Filter$calculate(task) calculates the filter values and stores them in $scores
The resulting data.table can be extracted via as.data.table() and then post-processed as usual, e.g. head(as.data.table(filter), 3)
all Task operations (e.g. subsetting) should be done with mlr3pipelines or manually using the operators of class Task.