riken-aip / pyHSICLasso

Versatile Nonlinear Feature Selection Algorithm for High-dimensional Data
MIT License
171 stars 42 forks source link

As a predictor? #38

Closed Mehrzadind closed 3 years ago

Mehrzadind commented 4 years ago

Hey That's awesome and I'm trying to use it in my thesis , but may I ask how to use it as a classifier ? I have looked the whole code but how to fit to different subsets and get overall precision score?

teyden commented 3 years ago

I'm not a contributor but I've forked the code and modified it a bit, so I have some understanding. The idea of using HSIC-LASSO is for feature selection. How you use it for classification is up to you. HSIC-LASSO can be used for feature selection for regression or classification problems, however does not act as a classifier on its own, from my understanding. Anyone who knows better, please feel free to correct.

How I've use it is through a two step process: 1) perform feature selection for a classification problem using HSIC-LASSO (such as in this example), and 2) take the selected features and transform your original data (subset your original X matrix to contain only the selected features) and then re-test with a classifier such as those in scikit-learn in order to obtain performance metrics such as precision and accuracy.

There are different ways to input your data and run feature selection (regression vs classification), so check the API (which unfortunately isn't fully or thoroughly documented, but the API code shows pretty clearly how you can use it).

hclimente commented 3 years ago

Sorry for our late reply. @teyden is right, the current implementation does not allow to predict new samples. When we use HSIC Lasso we use the implementation described by @teyden. The only precision is that we favour a classifier that can handle nonlinear relationships, like random forest or kernel SVM.

Mehrzadind commented 3 years ago

Thx, But what a bout Evaluation of selected features ? is there any specific method? What I have done in my thesis for evaluation is the evaluation of classification models such RF,XGB,... under the circumstances of selected subset of features by BHSIC. But I think there should be a specific evaluation process.

teyden commented 3 years ago

Can you provide an example of what you mean by the "evaluation" of selected features?

As I had described, using the selected features for classification (with RF or SVM, etc) should allow you to evaluate how predictive a reduced set of features performs. Another way that you can evaluate features is by looking at the measure of strength/support that says that a feature is important in a model, such as the HSIC-LASSO. The selected features outputted by HSIC-LASSO have their own importance rankings, which I find to be helpful.

On Sat., Oct. 3, 2020, 12:27 p.m. Mehrzadind, notifications@github.com wrote:

Thx, But what a bout Evaluation of selected features ? is there any specific method? What I have done in my thesis for evaluation is the evaluation of classification models such RF,XGB,... under the circumstances of selected subset of features by BHSIC. But I think there should be a specific evaluation process.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/riken-aip/pyHSICLasso/issues/38#issuecomment-703153549, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABW5VOXLCLUD5GEWRJKCOKTSI53KNANCNFSM4PBSDMPA .