Open BigBroKuang opened 4 weeks ago
Hello @BigBroKuang ,
Thank you for your interesting questions and your observation. To answer your first two questions, I prefer to warn you that solution paths for the lasso estimator are not always monotonic. It is possible to have a lambda L1 where the coefficient of a feature is not equal to 0, whereas it is 0 for a lambda L0 and L2 with L2 < L1 < L0. I invite you to dig in this thread https://stats.stackexchange.com/questions/154825/what-to-conclude-from-this-lasso-plot-glmnet
Also, you touch a second explanation of the use of multiple lambda with your third point. Because of the artificial feature generation random process, the selection process of Stabl is seed dependent. As we bootstrap on a subset of the whole dataset, the coefficient value is not the same for all bootstraps.
Tell me if you have other questions, Xavier
Thank you so much for your reply!
I tried to test the method on the entire dataset. Stabl produces multiple number of features with different random seed. My question is how should we determine the best seed or result?
Intuitively speaking, the reason that we use LASSO for feature selection is: after a feature j enters the lasso path at a specific lambda L0, for any L1<L0, the coefficient beta of j is theoretically greater than 0, and shows a monotonical increasing property. Yes, it is true that some coefficients are not monotonically increasing after L0 in real real experiments, but I think this is phenomenon is due to the randomization of initalization of fitting parameters or the introduction of knockoffs. If we assume that: it is theoretically true beta could be 0 or non-zero after L0, we cannot trust on LASSO anymore, since it cannot produce patterned results on beta. According to the definition of selection frequency in your paper, you assumed that beta could be either 0 or non-zero.
Hello @BigBroKuang,
You are right saying that Stabl is dependent of the random seed. The knockoff generation has an impact on the number of selected variable. The number of selected variable might change a little but the set of really informative feature is theoretically selected for every random seed. In practice:
For your second comment, the non-increasing behavior of the frequency path (frequency path of feature across multiple lambda) can be observed if you decrease the number of bootstrap. The increase in the number of bootstrap reduces this effect as well.