Closed zdebruine closed 1 year ago
Addressed in 969c62eb4ff47ce7035bd8d534500ecc072e77df and 22953f8a4783b8d4bd4e5a098050d24d33d0de9a, upper bound now can be imposed in the nnls
, predict.nmf
, and project
functions.
Theoretical advantages remain to be seen.
Hi Zach, super quick implementation! Thanks!! I am compiling the package on my Mac (using clang) and I am getting:
In file included from ../inst/include/RcppML.hpp:17:
../inst/include/RcppML/nmf.hpp:100:21: error: cannot assign to variable 'upper_bound' with const-qualified type 'const double'
upper_bound = upper_bound;
which then fails to build the package. This doesn't happen in the CRAN version. Will check on a Win Server whether this persists.
No issues to build on Win. I'll check whether a different compiler might help on mac.
I'm not on a Mac but tried to address this with 4f670544313e8b2c44d423c31553384d21ece9dc removing the const
specifier in the upperBound()
setter for the nmf
class.
Does the issue persist?
works! many thanks!!!!
Ok great. If you could confirm at some point that the results are similar to https://cran.r-project.org/web/packages/bvls/bvls.pdf
that would be awesome!
OK, so I played around a bit with a small system (900x5000). Here some findings:
the bound is not respected. there are values above the upper bound (500) in the solution.
Comparing to bvls, RcppML is magnitudes faster: 9s vs 256s
here a plot of the bvls vs RcppML parameters
here the same plot with setting all variables above 500 to 500.
also, it would be amazing if the bounds could be set for each variable separately as in the bvls case.
Reprex? I'm getting all.equal
and a perfect 1:1 plot in this case:
library(RcppML)
library(bvls)
data(hawaiibirds)
A <- hawaiibirds$counts
w <- RcppML::nmf(A, 10)@w
h <- project(w, A)
upper_bound <- 2
h_bounded_RcppML <- project(w, A, upper_bound = upper_bound)
h_bounded_bvls <- apply(as.matrix(A), 2, function(b) {
bvls::bvls(w, b, rep(0, ncol(w)), rep(upper_bound, ncol(w)))$x
})
df <- data.frame("RcppML" = as.vector(h_bounded_RcppML), "bvls" = as.vector(h_bounded_bvls))
library(ggplot2)
all.equal(df$RcppML, df$bvls)
ggplot(df, aes(bvls, RcppML)) + geom_point()
Hm, I'll look at how we could set bounds separately. That's getting a bit more complicated :)
I can confirm that I it's a perfect 1:1 in your example. I'll post mine in a bit. need to re-run it...
library(data.table)
library(RcppML)
library(ggplot2)
A <- readRDS("A_matrix.RDS")
B <- readRDS("B_vector.RDS")
bl = rep(0,ncol(A))
bu = rep(500,ncol(A))
system.time({
resnnls <- RcppML::nnls(a = crossprod(A), b= crossprod(A,B),cd_maxit = 1000, upper_bound = 500)
})
system.time({
resnnls2 = bvls(crossprod(A), crossprod(A,B), bl, bu)
})
resnnls2 <- data.table(resnnls2$x)
resnnls <- data.table(resnnls)
resnnls[,bvls:=resnnls2$V1]
resnnls[,RcppML:=V1]
#
library(ggplot2)
ggplot(resnnls)+geom_point(aes(bvls,RcppML))
with Matrix.zip
no out of bounds, but not 1:1. We'll try to reproduce the out of bounds.
First look, seems platform-dependent. On Linux I got out-of-bounds, on Windows I didn't, I have no clue why this is. Will look in a bit.
Finally got a good look at this, and I think that the code .
Here's what I get when I run your example:
Clearly not identical answers.
But now let's calculate the squared error of both solutions:
err_bvls_bvls <- sum((B - crossprod(t(A), as.matrix(resnnls$bvls)))^2)
err_RcppML_bvls <- sum((B - crossprod(t(A), as.matrix(resnnls$RcppML)))^2)
err_bvls_bvls
# 509807.1
err_RcppML_bvls
# 461442.1
If my math is correct and all, RcppML is actually doing better. I'd be curious if this is a strange edge case or if it holds true generally. Interested to get your feedback.
Closing due to inactivity and lack of theoretical clarity on how BVLS can be used within NMF.
I'm not even sure BVLS-NMF is weakly convex. For NNLS projections, BVLS can lead to suboptimal solutions with limited utility, it appears that different algorithms can discover different minima (i.e. my coordinate descent BVLS vs. the 1979 BVLS implementation in Fortran using active set methods). Further discussion welcome. I am tentatively planning to isolate BVLS in it's own function in the next major release and remove it's support in NMF or NNLS functions to simplify the codebase.
Feature request to impose upper bound on NNLS solutions.