riken-aip / pyHSICLasso

Versatile Nonlinear Feature Selection Algorithm for High-dimensional Data
MIT License
171 stars 42 forks source link

Is there a way to extract the predicted value of the trained HSIC Lasso (Regression)? #44

Closed srivas53 closed 10 months ago

srivas53 commented 1 year ago

After HSIC Lasso (Regression) has finished executing, we will have the beta values for every feature in the training dataset. Therefore, is there a way to determine the predicted value of a given instance? I am trying to evaluate the model fit via mean squared error, as done in the original paper (High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso, Section 4.3.2)

mattrmd commented 10 months ago

@srivas53 My understanding is that the authors did not "extract the predicted value of the trained HSIC LASSO" model. Instead, they trained a new model on the features selected by the HSIC model:

We use kernel regression (KR) with the Gaussian kernel for evaluating the mean squared error and the mean correlation when the top $m = 10, 20, ..., 50$ features selected by each method [(including HSIC)] are used. We first choose $50$ features and then use top $m = 10, 20, ..., 50$ features having the largest absolute regression coefficients.

EDIT: If you want to get the features that are selected by HSIC, you can get the indices of the selected features using

# ...
hsic_lasso = HSICLasso()

hsic_lasso.input(x_train, y_train)
hsic_lasso.regression(5)

sel_feats = hsic_lasso.get_index()
hclimente commented 10 months ago

That's correct, @mattrmd. Thanks for your contribution.