zdebruine / RcppML

Rcpp Machine Learning: Fast robust NMF, divisive clustering, and more
GNU General Public License v2.0
89 stars 15 forks source link

Scope for logistic loss function? #56

Closed GScottAlder closed 4 days ago

GScottAlder commented 1 week ago

Hello,

I'm trying to build a model for factorizing a large sparse binary matrix into non-negative factors. I've done some research into possible libraries for implementing this model, and this package really stands out to me. I especially appreciate the excellent documentation and the posts at https://www.zachdebruine.com/

Question for the authors: would it be possible to modify this library for a logistic loss function rather that least squares fit? Literature suggests that logistic loss yields better results (https://web.stanford.edu/~boyd/papers/pdf/glrm.pdf)

1) Is it even possible to modify the loss function without completely breaking this library? 2) If it is possible, what changes would be necessary? 3) If it isn't possible, are there any recommendations for large sparse logistic NMF?

I apologize for the rather open-ended question, but this package is the closest to what I am looking for so I thought I'd ask.

Thank you for your time, Scott

zdebruine commented 6 days ago

Hi Scott,

Yeah, the implementation I can intuitively see would be constraining the coordinate descent solver to either 0 or 1, and picking which of the two is closest. Mathematically, while this is a nice solution for a one-off least squares solution, it does create an issue with the convexity of the NMF objective and may create superficial local minima that preclude discovery of a reasonably optimal solution.

I have not read through the resource you linked (if you have a particular page/solution for NMF in there, I'm happy to check it out). One thought that occurs is adding regularization and annealing, where the regularization term /alpha would weight NNLS (1 - /alpha) vs. logistic (/alpha) objectives, and the annealing term (/beta) would provide a step function to move /alpha from 0 at initialization to 1 at convergence.

This is not super hard to implement, but it does add complexity to the codebase. And it's been a while since I've done anything to the library, and am not sure about dev time to get it done.

-Zach

GScottAlder commented 6 days ago

Hello,

Thanks for the quick response. Let me provide some more details.

We would not want/need to constrain the factors to having elements in {0,1}; It's better to have the factors be continuous real (nonnegative) numbers. The change I am suggesting is in the loss function: rather than minimizing quadratic loss (also called sum of square errors or Frobenius Norm if you're fancy), we would want to minimize Log Loss.

For "standard" NMF, we have

$y{i,j} \in \mathbb{R}$ $y{i,j} = u_i v_j + \epsilon$ $\epsilon \sim N(0, \sigma^2)$

Fitting this model requires minimizing the quadratic loss

$loss(U,V) = \Sigma (y_{i,j} - u_i v_j)^2$

For the "logistic" NMF I am proposing, we would have

$y{i,j} \in \set{0,1}$ $P{i,j} = logit^{-1}(u_i vj)$ $y{i,j} \sim Bernoulli(P_{i,j})$

Where $logit^{-1}(x) = \frac{1} {1 + \exp(-x)}$, which maps the factorization $u_i v_j$ from the real numbers to the interval (0,1)

Fitting this model requires minimizing the Log Loss

$loss(U,V) = \Sigma y{i,j} \cdot log(P{i,j}) - (1-y{i,j}) \cdot log(1 - P{i,j}) = \Sigma y_{i,j} \cdot log(logit^{-1}(u_i vj)) - (1-y{i,j}) \cdot log(1 - logit^{-1}(u_i v_j))$

This post has some more details on logistic regression loss functions (including derivative for computing gradients): https://leimao.github.io/blog/Conventional-Classification-Loss-Functions/

My original question boils down to: is it possible to swap out the loss/gradient calculations in this library to compute the logistic NMF instead of standard NMF? Would this new loss function still work with a sparse matrix?

Thanks, Scott

zdebruine commented 6 days ago

I see. No, it is not possible to swap out loss functions easily. Your best bet would be to fork the repo and change c_nnls() in https://github.com/zdebruine/RcppML/blob/main/inst/include/RcppML/nnls.hpp, and predict in https://github.com/zdebruine/RcppML/blob/main/inst/include/RcppML/predict.hpp. It would be a pretty substantial lift. The codebase in singlet is admittedly a lot cleaner. See https://github.com/zdebruine/singlet. If you do this, and it works, feel free to submit a PR back :)

GScottAlder commented 4 days ago

Hello,

Thanks for getting back to me (and sorry for my late response). I figured this wouldn't be an easy change, but thanks for sharing the files that would need to be modified for this feature. Someday I might get around to trying this, but for now I'll probably just use the library as is. It's a big improvement over my current method (irlba followed by nsprcomp to rotate the solution)