Closed larskotthoff closed 5 years ago
(I moved this issue to mlr3featsel since mlr3 does not feature selection per se)
Currently this is only used for Filters and basically means "calculate the filter values".
We are thinking about making .$calculate()
internal and call it during construction, i.e. filter$new()
. So that .$calculate()
or whatever it will be called becomes a private method.
I do not like rank()
so much, since this implies that it does "only" some ranking whereas in fact the generation/calculation of filter values is happening and the ranking part is only a subset of what is actually happening.
In fact, we are not even sure if .$calculate()
should do a ranking or just return the values in a shuffled order and the ranking is applied somewhere else.
For "wrapper methods" also referred to "feature selection" often, we could maybe use a function called .$run()
or similar? This would be more generic?
I like the idea of making it internal and calling it automatically, but then the name of the function to get the values should arguably be get()
.
Is there any scenario where it makes sense to get the values not in rank order? If not, I don't see why there should be a separate function to do that.
I like the idea of making it internal and calling it automatically, but then the name of the function to get the values should arguably be get().
Values are simply stored in filter$scores
, so there is no getter needed.
Is there any scenario where it makes sense to get the values not in rank order? If not, I don't see why there should be a separate function to do that.
I do not know of any right now. Michel coded it like that explicitly for some reason, maybe because this is also the behavior of mlr? generateFilterValuesData()
returns the values shuffled, only filterFeatures()
is doing the ranking.
@larskotthoff see https://github.com/mlr-org/mlr3featsel/issues/28#issuecomment-503750193
The functionality has been trimmed down to the following:
Filter$calculate(task)
calculates the filter values and stores them in $scores
The resulting data.table can be extracted via as.data.table()
and then post-processed as usual, e.g. head(as.data.table(filter), 3)
all Task operations (e.g. subsetting) should be done with mlr3pipelines or manually using the operators of class Task
.
The name for the function to calculate a ranking for feature importance is
calculate()
-- this is very non-descriptive and non-intuitive. I propose to rename toranking()
.