Open Rozenn opened 10 years ago
You can use the method LS-NMF, which minimizes the following objective function:
| Z * (X - WH) |^2
where Z is a weight matrix of the same dimension as the target matrix X, and *
is the entry-wise matrix product (Hadamard product).
So in you case, you could use a weight matrix that has constant columns, with the weights given by each sample weight:
library(NMF)
## random data
# target
x <- rmatrix(43, 1624)
# sample weights
w <- runif(ncol(x))
# weight matrix
Z <- matrix(w, nrow(x), ncol(x), byrow=TRUE)
#
# fit (limiting max number of iteration for the example)
res <- nmf(x, 3, 'ls-nmf', weight = Z, .opt = 'v2', maxIter = 200)
res
Please let me know if this solves your problem. Thank you.
I have solve the problem, thanks. But, I would like to estimate the factorization rank thanks to
nmfEstimateRank(x, 1:5, method = 'ls-nmf',weight = Z, .opt = 'v2' )
with the matrix Z like you have describe above on my data :
x=t(tab_qte[,1:42]) # table with 42rows and 2624 columns w=rep(ad_weight,nrow(x)) #vector with weigths per individuals Z=matrix(w,nrow(x),ncol(x),byrow=TRUE) # matrix of weights for each individuals
But the software R stopped (bug) and I have to close R. Is it possible to calculate the quality measures for each rank k with this method ?
Thanks you in advance,
Rozenn.
When you say "I have solve the problem" do you mean my suggestion works fine for you?
Yes, running the rank survey will give you the quality measures for each rank, but you say you got an error. Try starting from rank = 2, since a rank 1 may cause issues:
res <- nmf(x, 2:5, method = 'ls-nmf',weight = Z, .opt = 'v2' )
plot(res)
I have read the Wang an al's paper "LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates" I think that your suggestion is fine for me. I just want to be sure, when we introduce "uncertainly estimates", it is equivalent to give a weight for each individual in the analysis ? For example, in PCA, indivdual weight is usually 1 for all individuals. But we can run the analysis with different weights thanks to an argument in the function. For example, if the weight for the first individual is 2, it corresponds to duplicate the individual in the database for the analysis. I'am not sure it's the same objectiv when I used LS-NMF. I'am not sure to be clear... Concerning my issue with R, I tried your example : x=rmatrix(43, 2624)
w =runif(ncol(x)) # weight matrix Z=matrix(w, nrow(x), ncol(x), byrow=TRUE) res =nmf(x, 2:4, 'ls-nmf', weight = Z, .opt = 'v2', maxIter = 200) plot(res)
the software runs the analysis but on my data, the software bugs ! I don't understand. x=t(tab_qte[,1:42]) ## 42 rows, 2624 columns Sq=diag(apply(x,1,sd)) ## matrix of sd Sq_inv=diag(1/apply(x,1,sd)) x=Sq_inv%*%x #to reduce the data
w=rep(ad_weight,nrow(x)) Z=matrix(w,nrow(x),ncol(x),byrow=TRUE) # matrix of weights for each individuals
res =nmf(x, 2:4, 'ls-nmf', weight = Z, .opt = 'v2', maxIter = 200)
res
Here the error before the software bugs (it runs 30 runs for k=2 but stops after)
Compute NMF rank= 3 ... NMF algorithm: 'ls-nmf' Multiple runs: 30
foreach
environment: try-parallel [par]Thanks you very much for your help,
Rozenn
Message du 19/03/14 08:53 De : "Renaud" A : "renozao/NMF" Copie à : "Rozenn" Objet : Re: [NMF] NMF with individual weight (#8)
When you say "I have solve the problem" do you mean my suggestion works fine for you? Yes, running the rank survey will give you the quality measures for each rank, but you say you got an error. Try starting from rank = 2, since a rank 1 may cause issues:
res nmf(x, 2:5, method = 'ls-nmf',weight = Z, .opt = 'v2' ) plot(res)
—
Reply to this email directly or view it on GitHub.
Try running the with nrun = 1, this will tell us if the issue may come from parallel computations. Which version of the package are you running? There was an issue due to changes in doParallel, which I fixed in the latest version on CRAN (you would need >= 0.20.2). Make sure to update foreach and doParallel as well.
About the weights, I understand what you mean. Here are my thoughts on this, let me know if you agree on the rational. Suppose you had data for each individual in your population, the term of the objective function associated to sample/column j and feature/row i is:
[ xij - sum{k=1}^r w_ik h_kj ]^2
gathering samples from a given stratum S in your population gives:
[ sum_{j in S} xij - sum{k=1}^r wik (sum{j in S} h_kj) ]^2
If now you only have a representative sample of S, one could assume assume that it represents the average individual in S: sum_{j in S} x_ij = n_S x_iS, where n_S is the number of samples in S. You are then reduced to look for a mean contribution term hkS = sum{j in S} h_kj / n_S for S. So the term for S becomes:
[ n_S xiS - sum{k=1}^r w_ik n_S h_kS ]^2
which is:
[ n_S (xiS - sum{k=1}^r w_ik h_kS) ]^2
i.e. the ls-nmf term with weight n_S. This applies for all row i and strata.
Hope there is no logical bug in this :(
Thanks for your explanation, I have followed your reasonning ! I will talk about that with other members of my team and I will come back to you if I have an issue.
Rozenn
Message du 19/03/14 12:11 De : "Renaud" A : "renozao/NMF" Copie à : "Rozenn" Objet : Re: [NMF] NMF with individual weight (#8)
About the weights, I understand what you mean. Here are my thoughts on this, let me know if you agree on the rational. Suppose you had data for each individual in your population, the term of the objective function associated to sample/column j and feature/row i is:
[ xij - sum{k=1}^r w_ik h_kj ]^2
gathering samples from a given stratum S in your population gives:
[ sum_{j in S} xij - sum{k=1}^r wik (sum{j in S} h_kj) ]^2
If now you only have a representative sample of S, one could assume assume that it represents the average individual in S: sum_{j in S} x_ij = n_S x_iS, where n_S is the number of samples in S. You are then reduced to look for a mean contribution term hkS = sum{j in S} h_kj / n_S for S. So the term for S becomes:
[ n_S xiS - sum{k=1}^r w_ik n_S h_kS ]^2
which is:
[ n_S (xiS - sum{k=1}^r w_ik h_kS) ]^2
i.e. the ls-nmf term with weight n_S. This applies for all row i and strata.
Hope there is no logical bug in this :( — Reply to this email directly or view it on GitHub.
Yes, it runs with 'nrun=1' but not with more runs. NMF is under the version 0.20.5, doparallel : 1.0.8 and foreach 1.4.1 Here the error:
res <- nmf(x, 2:4,'ls-nmf', weight = Z, .opt = 'v2', maxIter = 200,nrun = 3) Compute NMF rank= 2 ... NMF algorithm: 'ls-nmf' Multiple runs: 3
Setting up requested
foreach
environment: try-parallel [par]Check host compatibility ... OK
Registering backend
doParallel
... OKSetting up RNG ... OK
Using foreach backend: doParallelSNOW [version 1.0.8]
Mode: parallel (3/4 core(s))
Check shared memory capability ... SKIP [disabled]
Runs: error calling combine function:
... DONE ERROR Timing stopped at: 1.31 0.57 20.32
ERROR Compute NMF rank= 3 ... NMF algorithm: 'ls-nmf' Multiple runs: 3
foreach
environment: try-parallel [par]doParallel
... OKMode: parallel (3/4 core(s))
Runs: error calling combine function:
... DONE ERROR Timing stopped at: 1.11 0.53 8.53
ERROR Compute NMF rank= 4 ... NMF algorithm: 'ls-nmf' Multiple runs: 3
foreach
environment: try-parallel [par]doParallel
... OKMode: parallel (3/4 core(s))
Runs: error calling combine function:
... DONE ERROR Timing stopped at: 1.06 0.52 8.3
ERROR Error in (function (...) : All the runs produced an error: -#1 [r=2] -> NMF::nmf - Unexpected error: no partial result seem to have been saved. -#2 [r=3] -> NMF::nmf - Unexpected error: no partial result seem to have been saved. -#3 [r=4] -> NMF::nmf - Unexpected error: no partial result seem to have been saved.
plot(res) Warning messages: 1: Removed 3 rows containing missing values (geom_path). 2: Removed 3 rows containing missing values (geom_path). 3: Removed 3 rows containing missing values (geom_path). 4: Removed 6 rows containing missing values (geom_path). 5: Removed 6 rows containing missing values (geom_path). 6: Removed 3 rows containing missing values (geom_point). 7: Removed 3 rows containing missing values (geom_point). 8: Removed 3 rows containing missing values (geom_point). 9: Removed 6 rows containing missing values (geom_point). 10: Removed 6 rows containing missing values (geom_point).
Rozenn
Message du 19/03/14 11:33 De : "Renaud" A : "renozao/NMF" Copie à : "Rozenn" Objet : Re: [NMF] NMF with individual weight (#8)
Try running the with nrun = 1, this will tell us if the issue may come from parallel computations. Which version of the package are you running? There was an issue due to changes in doParallel, which I fixed in the latest version on CRAN (you would need >= 0.20.2). Make sure to update foreach and doParallel as well. — Reply to this email directly or view it on GitHub.
Ok. We are getting more info here. I would leave out the multi-rank for now. The error should also appear on a normal parallel run. Can you please:
sessionInfo()
#**********************************************************
nmfCheck('ls-nmf', 3, weight = 1, nrun = 2, .opt='d')
#**********************************************************
nmf(x, 2, 'ls-nmf', weight = Z, .opt = 'd', maxIter = 200, nrun = 1)
#**********************************************************
nmf(x, 2, 'ls-nmf', weight = Z, .opt = 'd', maxIter = 200, nrun = 2)
save(x, Z, file = 'nmf-data.rda')
Thank you.
Hi, I have sent you the data and the ouput. Have you received it ? I have check on my own computer (not on my computer at work) and I have still an error. Have you an idea how to solve the pb ? Thanks in advance.
Rozenn
Dear Rozenn, I use NMF on my sparse matrix (4000*369), the command I use is :
estim.r <- nmf(mydata, 2 ,nrun =1,.opt = "v3")
I got error as below:
Runs: ... DONE
# Processing partial results ... ERROR
Error: NMF::nmf - Unexpected error: no partial result seem to have been saved.
Timing stopped at: 0.65 0.03 643.5
# NMF computation exit status ... ERROR
## Running rollback clean up ...
# Restoring RNG settings ... OK
# Restoring NMF options ... OK
# Restoring previous foreach backend '' ... OK
# Deleting temporary directory 'C:/Users/lenovo/Documents\NMF_177459a7f46' ... OK
I wonder if my matirx is too big? Because I used the smaller matrix from another project, it works well. If the size of matrix is not a problem, then is there other thing that may generate error above besides missing values, infinite values, row full of zeros, null/NA/infinite weights.
By the way, NMF is clustering colunms, right? I think we need make sure each column has not all 0s instead of each rows?
Hi,
can you please tell me which version of the package and OS are you using?
Matrix size should not be an issue. The command and log you sent does not seem to match: the command is for a single run, while the log is produced by a parallel multi-run. This makes it difficult to help.
Thank you very much for your reply. The package is NMF 0.205, system I use is Win7. Xin is my data matrix. I use command : xin[is.na(xin)]<-0 to convert all possible NA to 0s, just in case. Then I run again: I got error:
estim<- nmf(xin, 2, nrun =1) Warning message: In .local(x, rank, method, ...) : NMF residuals: final objective value is NA consensusmap(estim) Error in
rownames<-
(*tmp*
, value = c(1L, 0L)) : length of 'dimnames' [1] not equal to array extent
The error lead me cannot draw consensusmap, which I need to see which documents are clustered together. PS: the marix is not only big, but also really sparse, is that the reason causing error? And since the matrix is really big, it cannot display all element of that element in r or in excel file. So I cannot check if the matrix is good for NMF such as having NA or infinite value or not. (But it is term by documents matrix, it shouldn't contain that)
Really appreciate your help!
On 19 July 2014 15:51, Neo9061 notifications@github.com wrote:
Thank you very much for your reply. The package is NMF 0.205, system I use is Win7. Xin is my data matrix. I use command : xin[is.na(xin)]<-0 to convert all possible NA to 0s, just in case. Then I run again: I got error:
estim<- nmf(xin, 2, nrun =1) Warning message: In .local(x, rank, method, ...) : NMF residuals: final objective value is NA consensusmap(estim) Error in rownames<-(tmp, value = c(1L, 0L)) : length of 'dimnames' [1] not equal to array extent
The error lead me cannot draw consensusmap, which I need to see which documents are clustered together. PS: the marix is not only big, but also really sparse, is that the reason causing error? And since the matrix is really big, it cannot display all element of that element in r or in excel file. So I cannot check if the matrix is good for NMF such as having NA or infinite value or not. (But it is term by documents matrix, it shouldn't contain that)
Really appreciate your help!
— Reply to this email directly or view it on GitHub https://github.com/renozao/NMF/issues/8#issuecomment-49509980.
Dear Renaud,
I generate another matrix and want to use NMF but still generate error:
The error is :
`` # NMF computation exit status ... ERROR
## Running rollback clean up ... # Restoring RNG settings ... OK # Restoring NMF options ... OK # Restoring previous foreach backend '' ... OK _# Deleting temporary directory 'C:/Users/lenovo/Documents\NMF1edc6c305471' ... OK ERROR Error in (function (...) : All the runs produced an error:
The command I use is: _NMFfinal <- nmf(t(Book1),2:4, nrun =2,.opt = "v3"), #I make transpose here, because I want to cluster the rows#
I have also tried add noise, It generate error also: res.<-res.impute + rmatrix(res.impute, max = 10^-4) Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)? ``
The attachment is my data, just in case you want to know what is wrong with my code. Thanks a lot.
Best, Xin
Hi,
I'am currently working on a data with 1624 individuals and 43 variables. I would like to run an analysis with a weight per individual. Is it possible in your package? Have you an idea about how to take into account the individual weight, as PCA or factorial analysis does in some other packages ? My data came from a sample which has to be representative of the whole population studied, thanks to this variable "weight".
Thanks, Rozenn