gregbellan / Stabl

BSD 3-Clause Clear License
43 stars 10 forks source link

Unexpected (?) behavior #6

Closed mshqn closed 8 months ago

mshqn commented 8 months ago

First of all, thank you for the interesting paper and the package! Your results on correlated data were promising, so I wanted to try Stabl on my data, which suffers from multicollinearity.

I got some pretty weird results (output with no features frequently selected by other FS methods), so I wanted to try Stabl on a simple make_classification problem.

Here is my regular Lasso:

import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(penalty="l1", max_iter=int(1e6), solver="liblinear")
X, y = make_classification(n_features = 10, n_informative = 5, n_redundant = 5, random_state=1)
X = pd.DataFrame(X)
y = pd.Series(y)
model.fit(X,y)
model.coef_

array([[ 0. , -0.54880256, 0. , -1.978292 , -0.64994893, -1.26612462, 0. , -0.72129673, 0. , 0. ]])

And here is what I get with Stabl:

from stabl.stabl import Stabl, plot_stabl_path
stabl = Stabl(
            model,
            lambda_grid={"C": np.linspace(0.00001, 10, 100)},
            n_bootstraps=1000,
            artificial_type='knockoff',
            verbose=0,
            random_state=1)
stabl.fit(X, y)
plot_stabl_path(stabl)
stabl.get_feature_names_out()

array(['x3', 'x5', 'x7'], dtype=object) image

How can we interpret the fact that Stabl selects only 3 features when 5 are informative?

xavdurand commented 8 months ago

Hi @mshqn ,

I am glad to see that you are interested in using Stabl. The stability path is indicative of an overly broad exploration of the penalty parameter $\lambda$: the selection frequencies of the chosen features decrease drastically after surpassing the threshold.

To obtain a more interpretable result, it is necessary to reduce the C_max, for example, to 2, and you will obtain the following result.

stabl = Stabl(
    model,
    lambda_grid={"C": np.linspace(0.00001, 2, 100)},
    n_bootstraps=1000,
    artificial_type="knockoff",
    verbose=1,
    random_state=1,
)
stabl.fit(X, y)
plot_stabl_path(stabl)
stabl.get_feature_names_out()

Selected features: array(['x1', 'x3', 'x4', 'x5', 'x7'], dtype=object)

image

The informative ones are normally ['x0', 'x1', 'x2', 'x3', 'x4'], which is close to the result. By construction, the x7 and x5 features are redundant ones, i.e are generated as random linear combinations of the informative features., thus they could be informatives in one way. In this example, it means the variables ['x0', 'x1', 'x2', 'x3', 'x4'] and ['x1', 'x3', 'x4', 'x5', 'x7'] are interchangeable in a multivariate analysis, as they provide the same information.

mshqn commented 8 months ago

Hi xavdurand,

Thanks for your answer. This is actually strange as larger C values correspond to weaker regularization, and we could have expected an increase in sparsity after decreasing max limit... If I am not mixing up things.

I wonder what is the smart way to set lambda limits for Stabl? Max lambda = 2 caused it to select large feature subsets on my data. I tried to increase it as I need a more sparse subset. But if I had no preference for sparsity, what should I have done?

I think fitting regular Lasso with cv to select lambda.min would not be affected by changes in lambda limits unless they change lambda.min.

xavdurand commented 8 months ago

I will answer step by step:

Thanks for your answer. This is actually strange as larger C values correspond to weaker regularization, and we could have expected an increase in sparsity after decreasing max limit... If I am not mixing up things.

Following the sklearn LogisticRegression object documentation, in the case of classification, the C parameter is used and is the inverse of regularization strength: a greater value decrease the sparsity.

I wonder what is the smart way to set lambda limits for Stabl? Max lambda = 2 caused it to select large feature subsets on my data. I tried to increase it as I need a more sparse subset. But if I had no preference for sparsity, what should I have done?

It is possible to determine good range of C using the l1_min_c function from sklearn.svm:

# X is the input and y is the output
min_C = l1_min_c(X, y, loss="log")

Then, based on min_C, you can construct an interesting range by increasing it by a fixed step.

If you are interested in using Stabl, we can discuss by email: xdurand@surge.care.

mshqn commented 8 months ago

Thanks, I didn't know about this function. Hope this will be useful for other users.

xavdurand commented 8 months ago

If you are interested, there is a similar behavior in Regression task with the max value of the $\lambda$ parameter of the Lasso :

Capture d’écran 2024-01-25 à 11 12 21 AM

source: Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01