zdebruine / RcppML

Rcpp Machine Learning: Fast robust NMF, divisive clustering, and more
GNU General Public License v2.0
90 stars 15 forks source link

Clarification wrt L1/Lasso penalties #3

Closed sinanassiri closed 2 years ago

sinanassiri commented 2 years ago

Hi Zack and team, thanks for this awesome package!

I'd like to seek some clarification about the L1 penalty parameters in the nmf function:

L1/LASSO penalties between 0 and 1, array of length two for c(w,h)

Are the regularizations of w and h achieved independently? In Figure 4 of the preprint you show the effect of varying the penalty parameter per run, but it is not clear if you're referring to the penalty term for w, h, or both (set to the same value?). In Figure S4, however, you specifically focus on regularization of w. What are the implications of regularizing one of the two matrices (w or h) vs both simultaneously? I'm thinking of how to best vary these two parameters in a CV scheme; Does a grid-based search of pairwise combinations of L1 penalties c(w,h) sound reasonable to you?

Many thanks, Sina

zdebruine commented 2 years ago

Sina,

Happy to help! You are correct, w and h can be regularized independently, as we do in Figure S4, or collectively, as we do in Figure 4. You can see this by playing with the parameters and measuring the sparsity of w and h, the results will be very clear. At modest penalties (e.g., L1 < 0.1), there is no significant difference between regularizing both sides or just one side -- it's simply a matter of how sparse you want the trailing edge of w or h to be. It is unclear to me whether regularizing both sides has any special properties over regularizing just one side that become important at very harsh penalties (e.g., L1 > 0.2).

I'll clear up those doubts about how we regularized in Figure 4 in the next manuscript revision -- thanks!

I'm not sure a grid search for L1 regularization makes sense. L1 regularization (as it should do) simply sets nearly-zero coefficients to zero, it doesn't change the information in, or significantly alter the loss of, the model (see Figure S4). In light of that, what would your cross-validation objective be? Sparsity? There is no convex statistical measure that can be minimized that I am aware of. I'm trying to wrap my head around this as much as you are, so any ideas are appreciated! It is interpretable to have sparse factors with just a few important features rather than factors filled with these same values and a bunch of nearly-zero values, and I think that's what L1 in NMF is all about.

Zach

sinanassiri commented 2 years ago

Zack, thanks for the prompt and detailed reply. I was thinking of MSE as the cross-validation objective, but I see your point. I'll keep playing with the L1 penalty and will let you know if anything worth sharing comes up.

This was helpful, thanks again for you time!

Best, Sina

zdebruine commented 2 years ago

See Figure S4C. L1 penalties will only ever increase MSE, but the differences are nearly negligible at low penalties. There is a significant effect at higher penalties, so you could decide on a hard cutoff and find that tipping point.