Closed vtshen closed 6 years ago
It seems that you have not specified the Gamma (i.e., the regularization parameter for L2) when you use L0L2 penalty in your function call:
fit_l0l2 = L0Learn.fit(x, y, penalty="L0L2", algorithm="CDPSI", nLambda=50, maxSwaps = 1, maxIters = 1e4, maxSuppSize = max(50,ncol(x),nrow(x))) betahat_l0l2 = as.matrix(coef(fit_l0l2))
Please have a look at the Vignette:
http://www.mit.edu/~hazimeh/L0Learn-vignette.html
on how to specify this or use the function properly.
L0 and L0L2 (with Gamma = 0) should return similar solutions on the same grid of lambda values. Since the choice of lambda values is chosen by default, you may want to override it by specifying your own choice of lambda values (see the Vignette on how to do this).
Usually, a positive regularization for the L2 term (in L0L2) will lead to additional shrinkage so the coefficient values will be shrunk when compared to those without L2 shrinkage (assuming they lead to the same support).
Also to add to Rahul's comment, L0Learn currently only supports CDPSI(1) (no higher order swaps are supported). The parameter maxSwaps
is an upper bound on the number of iterations performed by the swapping algorithm (specifically, in the paper, this is the maximum number iterations performed by the for loop in Algorithm 2). The current default value for maxSwaps
is 100, and we don't recommend setting it to lower values since small values like 1 will not typically lead to CDPSI(1) minima..
HI, Prof Rahul Mazumder and Hussein, thanks so much for the quick response. Now I am using the code below to run L0L2 method. Now the plots look better.
Please correct me if I was wrong, the reason that I did not specify gamma is because I thought I could use the default values, which are: nGamma = 10, gammaMax = 10, gammaMin = 1e-04
fit_l0l2 = L0Learn.fit(x, y, penalty="L0L2", algorithm="CDPSI", nLambda=50, maxIters = 1e4, maxSuppSize = min(50,ncol(x),nrow(x)))
I have more settings and I will investigate those cases.
That's great, and thanks for the feedback! The gamma parameter range for l2 regularization is very problem dependent. It might be a good idea to start with a wide range for gamma and plot the cross-validation errors. Based on the shape of the plot, you can select a narrower range and rerun cross-validation on this new range. The default values of gamma supplied in the package are (generally) definitely not optimal for a given problem -- I will make this point clearer in the next update to the vignette.
Another question about the values of gamma, if in my case the cross-validation is not convenient to determine the range of gamma.
Please feel free to correct me that I am assuming that inside the package code, the value of gamma is not adjusted/scaled based on the data size n.
Then if I have two choices of candidates to be used as the maximum value of gamma: max(abs(xty))/nrow(x) and max(abs(xty)), which one do you think is more reasonable?
Before fitting, the package normalizes both y and the columns of X to have unit l2 norm. The values of lambda/gamma used (i.e., those specified by the user or selected by the package) are with respect to the normalized data.
The L0 and L0L2 results from the CDPSI(1) are confusing to me. It could be either my usage of the package or my poor understanding of the L0 and L0L2 methods. So I am posting to ask for some help in order to avoid mistakenly using the package.
First, please correct me if I was wrong that I used the arguments, algorithm="CDPSI" and maxSwaps=1, to run the CDPSI(1) mentioned in the paper.
The graphs below are about the comparison of nonzero coefficients and risk errors from the L0 and L0L2 method in the package. The parameters are tuned based on validation data sets.
In that case setting, the plots show that L0L2 has larger risk error than L0, and L0L2 sometimes could have smaller nonzero coefficient numbers than L0. But why does L0L2 have smaller nonzero coefficient numbers?
(plots updated based on comments)
R code to generate the plots, (updated based on comments)