renozao / NMF

NMF: A Flexible R package for Nonnegative Matrix Factorization
137 stars 40 forks source link

NMF with individual weight #8

Open Rozenn opened 10 years ago

Rozenn commented 10 years ago

Hi,

I'am currently working on a data with 1624 individuals and 43 variables. I would like to run an analysis with a weight per individual. Is it possible in your package? Have you an idea about how to take into account the individual weight, as PCA or factorial analysis does in some other packages ? My data came from a sample which has to be representative of the whole population studied, thanks to this variable "weight".

Thanks, Rozenn

renozao commented 10 years ago

You can use the method LS-NMF, which minimizes the following objective function:

| Z * (X - WH) |^2

where Z is a weight matrix of the same dimension as the target matrix X, and * is the entry-wise matrix product (Hadamard product). So in you case, you could use a weight matrix that has constant columns, with the weights given by each sample weight:

library(NMF)
## random data
# target
x <- rmatrix(43, 1624)
# sample weights
w <- runif(ncol(x))
# weight matrix
Z <- matrix(w, nrow(x), ncol(x), byrow=TRUE)
#

# fit (limiting max number of iteration for the example)
res <- nmf(x, 3, 'ls-nmf', weight = Z, .opt = 'v2', maxIter = 200)
res

Please let me know if this solves your problem. Thank you.

Rozenn commented 10 years ago

I have solve the problem, thanks. But, I would like to estimate the factorization rank thanks to

nmfEstimateRank(x, 1:5, method = 'ls-nmf',weight = Z, .opt = 'v2' )

with the matrix Z like you have describe above on my data :

x=t(tab_qte[,1:42]) # table with 42rows and 2624 columns w=rep(ad_weight,nrow(x)) #vector with weigths per individuals Z=matrix(w,nrow(x),ncol(x),byrow=TRUE) # matrix of weights for each individuals

But the software R stopped (bug) and I have to close R. Is it possible to calculate the quality measures for each rank k with this method ?

Thanks you in advance,

Rozenn.

renozao commented 10 years ago

When you say "I have solve the problem" do you mean my suggestion works fine for you?

Yes, running the rank survey will give you the quality measures for each rank, but you say you got an error. Try starting from rank = 2, since a rank 1 may cause issues:

res <- nmf(x, 2:5, method = 'ls-nmf',weight = Z, .opt = 'v2' )
plot(res)
Rozenn commented 10 years ago

I have read the Wang an al's paper "LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates" I think that your suggestion is fine for me. I just want to be sure, when we introduce "uncertainly estimates", it is equivalent to give a weight for each individual in the analysis ? For example, in PCA, indivdual weight is usually 1 for all individuals. But we can run the analysis with different weights thanks to an argument in the function. For example, if the weight for the first individual is 2, it corresponds to duplicate the individual in the database for the analysis. I'am not sure it's the same objectiv when I used LS-NMF. I'am not sure to be clear...   Concerning my issue with R, I tried your example :  x=rmatrix(43, 2624)

sample weights

w =runif(ncol(x)) # weight matrix  Z=matrix(w, nrow(x), ncol(x), byrow=TRUE)  res =nmf(x, 2:4, 'ls-nmf', weight = Z, .opt = 'v2', maxIter = 200)  plot(res)

the software runs the analysis but on my data, the software bugs ! I don't understand.   x=t(tab_qte[,1:42]) ## 42 rows, 2624 columns  Sq=diag(apply(x,1,sd)) ## matrix of sd  Sq_inv=diag(1/apply(x,1,sd))  x=Sq_inv%*%x #to reduce the data

w=rep(ad_weight,nrow(x))  Z=matrix(w,nrow(x),ncol(x),byrow=TRUE) # matrix of weights for each individuals

res =nmf(x, 2:4, 'ls-nmf', weight = Z, .opt = 'v2', maxIter = 200) 

res

Here the error before the software bugs (it runs 30 runs for k=2 but stops after)

Run: 30/30 NMF algorithm: 'ls-nmf' NMF seeding method: random Iterations: 200/200 DONE (stopped at 200/200 iterations) 

NMF computation exit status ... OK 

DONE ... DONE System time: user system elapsed 68.06 6.84 74.98 

NMF computation exit status ... OK + measures ... ERROR 

Compute NMF rank= 3 ... NMF algorithm: 'ls-nmf' Multiple runs: 30 

Setting up requested foreach environment: try-parallel [par]

  Thanks you very much for your help,  

Rozenn

Message du 19/03/14 08:53 De : "Renaud" A : "renozao/NMF" Copie à : "Rozenn" Objet : Re: [NMF] NMF with individual weight (#8)

When you say "I have solve the problem" do you mean my suggestion works fine for you? Yes, running the rank survey will give you the quality measures for each rank, but you say you got an error. Try starting from rank = 2, since a rank 1 may cause issues:

res nmf(x, 2:5, method = 'ls-nmf',weight = Z, .opt = 'v2' ) plot(res)

Reply to this email directly or view it on GitHub.

renozao commented 10 years ago

Try running the with nrun = 1, this will tell us if the issue may come from parallel computations. Which version of the package are you running? There was an issue due to changes in doParallel, which I fixed in the latest version on CRAN (you would need >= 0.20.2). Make sure to update foreach and doParallel as well.

renozao commented 10 years ago

About the weights, I understand what you mean. Here are my thoughts on this, let me know if you agree on the rational. Suppose you had data for each individual in your population, the term of the objective function associated to sample/column j and feature/row i is:

[ xij - sum{k=1}^r w_ik h_kj ]^2

gathering samples from a given stratum S in your population gives:

[ sum_{j in S} xij - sum{k=1}^r wik (sum{j in S} h_kj) ]^2

If now you only have a representative sample of S, one could assume assume that it represents the average individual in S: sum_{j in S} x_ij = n_S x_iS, where n_S is the number of samples in S. You are then reduced to look for a mean contribution term hkS = sum{j in S} h_kj / n_S for S. So the term for S becomes:

[ n_S xiS - sum{k=1}^r w_ik n_S h_kS ]^2

which is:

[ n_S (xiS - sum{k=1}^r w_ik h_kS) ]^2

i.e. the ls-nmf term with weight n_S. This applies for all row i and strata.

Hope there is no logical bug in this :(

Rozenn commented 10 years ago

Thanks for your explanation, I have followed your reasonning ! I will talk about that with other members of my team and I will come back to you if I have an issue.

Rozenn

Message du 19/03/14 12:11 De : "Renaud" A : "renozao/NMF" Copie à : "Rozenn" Objet : Re: [NMF] NMF with individual weight (#8)

About the weights, I understand what you mean. Here are my thoughts on this, let me know if you agree on the rational. Suppose you had data for each individual in your population, the term of the objective function associated to sample/column j and feature/row i is:

[ xij - sum{k=1}^r w_ik h_kj ]^2

gathering samples from a given stratum S in your population gives:

[ sum_{j in S} xij - sum{k=1}^r wik (sum{j in S} h_kj) ]^2

If now you only have a representative sample of S, one could assume assume that it represents the average individual in S: sum_{j in S} x_ij = n_S x_iS, where n_S is the number of samples in S. You are then reduced to look for a mean contribution term hkS = sum{j in S} h_kj / n_S for S. So the term for S becomes:

[ n_S xiS - sum{k=1}^r w_ik n_S h_kS ]^2

which is:

[ n_S (xiS - sum{k=1}^r w_ik h_kS) ]^2

i.e. the ls-nmf term with weight n_S. This applies for all row i and strata.

Hope there is no logical bug in this :( — Reply to this email directly or view it on GitHub.

Rozenn commented 10 years ago

Yes, it runs with 'nrun=1' but not with more runs. NMF is under the version 0.20.5, doparallel : 1.0.8 and foreach 1.4.1   Here the error:

res <- nmf(x, 2:4,'ls-nmf', weight = Z, .opt = 'v2', maxIter = 200,nrun = 3) Compute NMF rank= 2  ...  NMF algorithm: 'ls-nmf' Multiple runs: 3

Setting up requested foreach environment: try-parallel [par]

Check host compatibility ... OK

Registering backend doParallel ... OK

Setting up RNG ... OK

Using foreach backend: doParallelSNOW [version 1.0.8]

Mode: parallel (3/4 core(s))

Check shared memory capability ... SKIP [disabled]

Runs: error calling combine function:

 ... DONE ERROR Timing stopped at: 1.31 0.57 20.32 

NMF computation exit status ... ERROR

ERROR Compute NMF rank= 3  ...  NMF algorithm: 'ls-nmf' Multiple runs: 3

Setting up requested foreach environment: try-parallel [par]

Check host compatibility ... OK

Registering backend doParallel ... OK

Setting up RNG ... OK

Using foreach backend: doParallelSNOW [version 1.0.8]

Mode: parallel (3/4 core(s))

Check shared memory capability ... SKIP [disabled]

Runs: error calling combine function:

 ... DONE ERROR Timing stopped at: 1.11 0.53 8.53 

NMF computation exit status ... ERROR

ERROR Compute NMF rank= 4  ...  NMF algorithm: 'ls-nmf' Multiple runs: 3

Setting up requested foreach environment: try-parallel [par]

Check host compatibility ... OK

Registering backend doParallel ... OK

Setting up RNG ... OK

Using foreach backend: doParallelSNOW [version 1.0.8]

Mode: parallel (3/4 core(s))

Check shared memory capability ... SKIP [disabled]

Runs: error calling combine function:

 ... DONE ERROR Timing stopped at: 1.06 0.52 8.3 

NMF computation exit status ... ERROR

ERROR Error in (function (...)  : All the runs produced an error: -#1 [r=2] -> NMF::nmf - Unexpected error: no partial result seem to have been saved. -#2 [r=3] -> NMF::nmf - Unexpected error: no partial result seem to have been saved. -#3 [r=4] -> NMF::nmf - Unexpected error: no partial result seem to have been saved.

plot(res) Warning messages: 1: Removed 3 rows containing missing values (geom_path).  2: Removed 3 rows containing missing values (geom_path).  3: Removed 3 rows containing missing values (geom_path).  4: Removed 6 rows containing missing values (geom_path).  5: Removed 6 rows containing missing values (geom_path).  6: Removed 3 rows containing missing values (geom_point).  7: Removed 3 rows containing missing values (geom_point).  8: Removed 3 rows containing missing values (geom_point).  9: Removed 6 rows containing missing values (geom_point).  10: Removed 6 rows containing missing values (geom_point). 

Rozenn

Message du 19/03/14 11:33 De : "Renaud" A : "renozao/NMF" Copie à : "Rozenn" Objet : Re: [NMF] NMF with individual weight (#8)

Try running the with nrun = 1, this will tell us if the issue may come from parallel computations. Which version of the package are you running? There was an issue due to changes in doParallel, which I fixed in the latest version on CRAN (you would need >= 0.20.2). Make sure to update foreach and doParallel as well. — Reply to this email directly or view it on GitHub.

renozao commented 10 years ago

Ok. We are getting more info here. I would leave out the multi-rank for now. The error should also appear on a normal parallel run. Can you please:

sessionInfo()
#**********************************************************
nmfCheck('ls-nmf', 3, weight = 1, nrun = 2, .opt='d')
#**********************************************************
nmf(x, 2, 'ls-nmf', weight = Z, .opt = 'd', maxIter = 200, nrun = 1)
#**********************************************************
nmf(x, 2, 'ls-nmf', weight = Z, .opt = 'd', maxIter = 200, nrun = 2)

Thank you.

Rozenn commented 10 years ago

Hi, I have sent you the data and the ouput. Have you received it ? I have check on my own computer (not on my computer at work) and I have still an error. Have you an idea how to solve the pb ? Thanks in advance.

Rozenn

Neo9061 commented 10 years ago

Dear Rozenn, I use NMF on my sparse matrix (4000*369), the command I use is :

estim.r <- nmf(mydata,  2 ,nrun =1,.opt = "v3")

I got error as below:

Runs:  ... DONE
# Processing partial results ... ERROR
Error: NMF::nmf - Unexpected error: no partial result seem to have been saved.
Timing stopped at: 0.65 0.03 643.5 
# NMF computation exit status ... ERROR

## Running rollback clean up ... 
# Restoring RNG settings ... OK
# Restoring NMF options ... OK
# Restoring previous foreach backend '' ... OK
# Deleting temporary directory 'C:/Users/lenovo/Documents\NMF_177459a7f46' ... OK

I wonder if my matirx is too big? Because I used the smaller matrix from another project, it works well. If the size of matrix is not a problem, then is there other thing that may generate error above besides missing values, infinite values, row full of zeros, null/NA/infinite weights.

By the way, NMF is clustering colunms, right? I think we need make sure each column has not all 0s instead of each rows?

renozao commented 10 years ago

Hi,

can you please tell me which version of the package and OS are you using?

Matrix size should not be an issue. The command and log you sent does not seem to match: the command is for a single run, while the log is produced by a parallel multi-run. This makes it difficult to help.

Neo9061 commented 10 years ago

Thank you very much for your reply. The package is NMF 0.205, system I use is Win7. Xin is my data matrix. I use command : xin[is.na(xin)]<-0 to convert all possible NA to 0s, just in case. Then I run again: I got error:

estim<- nmf(xin, 2, nrun =1) Warning message: In .local(x, rank, method, ...) : NMF residuals: final objective value is NA consensusmap(estim) Error in rownames<-(*tmp*, value = c(1L, 0L)) : length of 'dimnames' [1] not equal to array extent

The error lead me cannot draw consensusmap, which I need to see which documents are clustered together. PS: the marix is not only big, but also really sparse, is that the reason causing error? And since the matrix is really big, it cannot display all element of that element in r or in excel file. So I cannot check if the matrix is good for NMF such as having NA or infinite value or not. (But it is term by documents matrix, it shouldn't contain that)

Really appreciate your help!

renozao commented 10 years ago

On 19 July 2014 15:51, Neo9061 notifications@github.com wrote:

Thank you very much for your reply. The package is NMF 0.205, system I use is Win7. Xin is my data matrix. I use command : xin[is.na(xin)]<-0 to convert all possible NA to 0s, just in case. Then I run again: I got error:

estim<- nmf(xin, 2, nrun =1) Warning message: In .local(x, rank, method, ...) : NMF residuals: final objective value is NA consensusmap(estim) Error in rownames<-(tmp, value = c(1L, 0L)) : length of 'dimnames' [1] not equal to array extent

The error lead me cannot draw consensusmap, which I need to see which documents are clustered together. PS: the marix is not only big, but also really sparse, is that the reason causing error? And since the matrix is really big, it cannot display all element of that element in r or in excel file. So I cannot check if the matrix is good for NMF such as having NA or infinite value or not. (But it is term by documents matrix, it shouldn't contain that)

Really appreciate your help!

— Reply to this email directly or view it on GitHub https://github.com/renozao/NMF/issues/8#issuecomment-49509980.

Neo9061 commented 10 years ago

Dear Renaud,

I generate another matrix and want to use NMF but still generate error:

The error is :

`` # NMF computation exit status ... ERROR

## Running rollback clean up ... # Restoring RNG settings ... OK # Restoring NMF options ... OK # Restoring previous foreach backend '' ... OK _# Deleting temporary directory 'C:/Users/lenovo/Documents\NMF1edc6c305471' ... OK ERROR Error in (function (...) : All the runs produced an error:

The command I use is: _NMFfinal <- nmf(t(Book1),2:4, nrun =2,.opt = "v3"), #I make transpose here, because I want to cluster the rows#

I have also tried add noise, It generate error also: res.<-res.impute + rmatrix(res.impute, max = 10^-4) Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)? ``

The attachment is my data, just in case you want to know what is wrong with my code. Thanks a lot.

Best, Xin