zdebruine / RcppML

Rcpp Machine Learning: Fast robust NMF, divisive clustering, and more
GNU General Public License v2.0
89 stars 15 forks source link

RcppML::project renamed? #25

Closed ceesu closed 2 years ago

ceesu commented 2 years ago

Hello there, thanks for your work on this package. I installed via devtools::install_github("zdebruine/RcppML") but don't seem to see the method 'project' mentioned in the vignette.

> packageVersion('RcppML')
[1] ‘0.5.2’
> RcppML::project(X, res@w)
Error: 'project' is not an exported object from 'namespace:RcppML'

Has it possibly been replaced by predict?

zdebruine commented 2 years ago

Yes, it has been replaced by predict.nmf (and is the same). Sorry for the breaking change, but I felt like this terminology would be more clear for a broad user base.

?predict.nmf

predict.nmf is also an S4 method for the result of nmf, which is now an S4 class.

I have a lot of work to do cleaning up the documentation and methods, hopefully everything becomes more clear in the next few days.

ceesu commented 2 years ago

Thanks! One thing I am wondering about is the scale of the matrix coming from predict. I want to tried the following where X and new_data are splits of the same dataset:

res <- RcppML::nmf(X, k=10, nonneg=TRUE)
new <- RcppML::predict(res, new_data)

However I find that the scale of numbers new is much larger than the scale of res@h. Is there a transformation I can do so I can have comparable representations of X and new_data in terms of the nmf components nmf1, nmf2 etc.?

zdebruine commented 2 years ago

Absolutely, and I'll add this info to the docs in the next release.

You can divide each factor in h by the sum of that factor, which has the effect of normalizing all factors to sum to 1. The scaling diagonal is then equal to rowSums in your predicted model.

data(movielens)
X <- movielens$ratings[, 1:500]
new_data <- movielens$ratings[, 501:ncol(movielens$ratings)]
res <- RcppML::nmf(X, k=10, nonneg=TRUE)
new <- RcppML::predict(res, new_data)

new_normalized <- new / rowSums(new)
new_diag <- rowSums(new)

This is how diagonalized NMF by alternating least squares works. Update, normalize, update normalize...

You should think about how the model should be normalized in your application. If each factor needs to be considered equally, use normalization. If not, you may want to multiply your NMF model@h by the scaling diagonal (i.e. Diagonal(x = model@d) %*% model@h

ceesu commented 2 years ago

Great, thank you!