haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
5.99k stars 1.12k forks source link

Extend FeatureRanking interface for regression tasks #684

Open sabbatinif opened 3 years ago

sabbatinif commented 3 years ago

It may be useful to have a feature ranking procedure applicable not only to classification tasks (e.g. SignalNoiseRatio and SumSquaresRatio, implementing the FeatureRanking interface), but also to regression tasks. At the moment the FeatureRanking interface only accepts integer target vectors for calculating the feature rank.

haifengl commented 2 years ago

What feature selection criteria for regression are of interest?

sabbatinif commented 2 years ago

I have no strong preferences about the criteria. I can suggest something similar to Python SciKit-Learn's feature_selection.f_regression. It consists of a sequential algorithm aimed at iteratively and greedly selecting the most relevant features of a dataset. It starts by training a temporary regressor on a single feature (the most correlated with respect to the output values) and it keeps repeating this operation by adding one feature at a time, always peaking the one that mostly increases the temporary regressor predictive performance. At the end of this process, features are ranked on the basis of their relevance. But any other criteria is useful for me