[LRNRQ] Add ABESS from package abess

bbayukari commented 2 years ago

Algorithm

Adaptive BEst Subset Selection

Package

abess

Supported types

[x] classif
[ ] clust
[ ] dens
[x] regr
[x] surv

I have checked that this is not already implemented in

[x] mlr3
[x] mlr3learners
[x] mlr3extralearners
[x] Other core packages (e.g. mlr3proba, mlr3keras)

Why do I think this is a useful learner?

ABESS is a generic algorithm framework to solve the best subset selection problem with high accuracy and a short time. Now it can work with these models: linear regression, multi-linear regression, classification (binary and multi-class), Cox regression, etc.

In fact, ABESS supports almost all GLM(generalized linear model) and M-estimator with objective functions satisfying Strong Convexity and Smoothness. We have implemented some GLMs we consider important and other convex models like PCA.

As an algorithm for feature selection, ABESS just needs a little time and can get accurate results. Specific introductions and experiments can be found here.

Further Optional Comments

I have two confusion and don't know the best way to deal.

As we can see, ABESS is a generic algorithm framework rather than a single learner and will support more models. So, I want to request several learners like 'abess.gaussian', 'abess.logistic', etc. Is this OK?

Although ABESS can estimate parameters and predict under some models just like the normal learner, the most important function of ABESS is to select features which is the only function that can be guaranteed to be available. For example, ABESS principal component analysis (abessPCA) doesn't aim to predict anything but seeks principal components with a sparsity limitation, so it isn't a normal learner. How does it work with mlr3?

Looking forward to any suggestions!

sebffischer commented 2 years ago

Hey @bbayukari, thanks for raising this issue and your intereste in mlr3!

With respect to different learners: Yes, this would be ok for me. What speaks against making one abess learner and a parameter that can choose between the different models?

If it is mostly about feature selection, it might be a better fit for mlr3fselect. I am not really familiar with the mlr3fselect package myself, but conceptually it belongs there. I am not sure however whether the class-structure is set up in a way to allow for an easy extension.

A more generic object in mlr3 is the PipeOp https://github.com/mlr-org/mlr3pipelines/pulls which would also allow to implement something like feature selection as well, are you familiar with mlr3pipelines?

Note also, that learners can have the property "selected_features". When a learner has this property (e.g. trees like rpart), a method $selected_features() has to be implemented, that can be called after training to return a character vector containing the selected features.

sebffischer commented 2 years ago

@be-marc The current structure of mlr3fselect does not easily allow to include an off-the-shelf feature selection algorithm or am I wrong?

be-marc commented 2 years ago

@be-marc The current structure of mlr3fselect does not easily allow to include an off-the-shelf feature selection algorithm or am I wrong?

Yes, mlr3fselect is only for wrapper feature selection.

Although ABESS can estimate parameters and predict under some models just like the normal learner, the most important function of ABESS is to select features which is the only function that can be guaranteed to be available.

Then it makes sense to use a PipeOp that outputs a subsetted task.

sebffischer commented 2 years ago

Is there still interest in this issue?

mlr-org / mlr3extralearners