gregbellan / Stabl

BSD 3-Clause Clear License
42 stars 10 forks source link

the definition of selection frequency is correct? #15

Open BigBroKuang opened 4 weeks ago

BigBroKuang commented 4 weeks ago
  1. I used LogisticRegression to test the code. I set C (which is the inverse of lambda) as np.logspace(-2,2,30), I observed that the index of the highest selection frequency is always from the highest C (lowest lambda). It seems that the Lasso method is not working?
  2. According to the definition of fj, when a feature j enters the Lasso path at a specific lambda (lets say L0, from high to low value), For L1<L0, the coefficient beta of the feature is non-zero. Theoretically speaking, if L2<L1<L0, we can say that the fj(L2)>fj(L1)>fj(L0), which means that smaller lambda produces higher selection frequency? It seems that there is no need to test multiple lambda values and if we set lambda to an extreme small value, we will be getting the best selection frequency?
  3. I also tries to rerun the code with different random seed, it seems that the results at each run is totally different because the knockoffs generated from each run is different.
xavdurand commented 3 weeks ago

Hello @BigBroKuang ,

Thank you for your interesting questions and your observation. To answer your first two questions, I prefer to warn you that solution paths for the lasso estimator are not always monotonic. It is possible to have a lambda L1 where the coefficient of a feature is not equal to 0, whereas it is 0 for a lambda L0 and L2 with L2 < L1 < L0. I invite you to dig in this thread https://stats.stackexchange.com/questions/154825/what-to-conclude-from-this-lasso-plot-glmnet

Also, you touch a second explanation of the use of multiple lambda with your third point. Because of the artificial feature generation random process, the selection process of Stabl is seed dependent. As we bootstrap on a subset of the whole dataset, the coefficient value is not the same for all bootstraps.

Tell me if you have other questions, Xavier

BigBroKuang commented 3 weeks ago

Thank you so much for your reply!

I tried to test the method on the entire dataset. Stabl produces multiple number of features with different random seed. My question is how should we determine the best seed or result?

BigBroKuang commented 3 weeks ago

Intuitively speaking, the reason that we use LASSO for feature selection is: after a feature j enters the lasso path at a specific lambda L0, for any L1<L0, the coefficient beta of j is theoretically greater than 0, and shows a monotonical increasing property. Yes, it is true that some coefficients are not monotonically increasing after L0 in real real experiments, but I think this is phenomenon is due to the randomization of initalization of fitting parameters or the introduction of knockoffs. If we assume that: it is theoretically true beta could be 0 or non-zero after L0, we cannot trust on LASSO anymore, since it cannot produce patterned results on beta. According to the definition of selection frequency in your paper, you assumed that beta could be either 0 or non-zero.

xavdurand commented 3 weeks ago

Hello @BigBroKuang,

You are right saying that Stabl is dependent of the random seed. The knockoff generation has an impact on the number of selected variable. The number of selected variable might change a little but the set of really informative feature is theoretically selected for every random seed. In practice:

For your second comment, the non-increasing behavior of the frequency path (frequency path of feature across multiple lambda) can be observed if you decrease the number of bootstrap. The increase in the number of bootstrap reduces this effect as well.