Open pat-s opened 5 years ago
For the filters: I'd start with stats / no pkg, then try to connect the modern filter packages (FSelectorRcpp and praznik).
We don't need 3 forest filters, we can solve this more generically by extending mlr3 learners with methods to extract feature scores.
we can solve this more generically by extending mlr3 learners with methods to extract feature scores.
This is really important to be able to use all kinds of embedded feature selection directly by the learner.
I don't really see a reason to use the Java FSelector package when there is FSelectoRcpp.
I don't really see a reason to use the Java FSelector package when there is FSelectoRcpp.
The later does not have all filters of the former. See https://mlr.mlr-org.com/articles/tutorial/filter_methods.html.
Well, with this argument we have to include all possible filters :smile:
I would suggest we start without it, and if people complain/open issues we can still add them later. Or are there any really important filters not yet in FSelectorRcpp?
there never is (should be) pressure to include everything, include what is most important
My comment was more meant to be a comparison, not a statement that we should do it :)
NB: All learners which have some sort of "importance" are now supported via FilterVariableImportance.
Filters
Pkg
No pkg
[x] AUC
[ ] generic permutation
[x] univariate.model.score
stats
[x] anova
[x] kruskal
[x] linear.correlation
[x] rank.correlation
[x] variance
FSelector
Do we want to have these filters in again? Slow and Java problems..
FSelectorRcpp
Learner integrated filters
[x] ranger.impurity
[x] ranger.permutation
[x] cforest.importance
Do we want to add the
ramdomForest
andrandomForestSRC
ones?mRMRe
- [ ] mrmr-> slow and no support for classif tasks https://github.com/mlr-org/mlr/issues/2604praznik
[x] CMIM
[x] DISR
[x] JMI
[x] JMIM
[x] MIM
[x] MRMR
[x] NJMIM
care
spFSR
Need to check.
Ensemble filters
[ ] Min
[ ] Mean
[ ] Median
[ ] Max
[ ] Borda
[ ] Borda-staircase
[ ] Borda-power