renozao / NMF

NMF: A Flexible R package for Nonnegative Matrix Factorization
137 stars 40 forks source link

NMF Question #3

Closed mjbock closed 10 years ago

mjbock commented 10 years ago

I am hoping to use your NMF package for chemical fingerprinting. Basically chemical data from a series of samples is used to determine the end member compositions and the contribution of each end member to each sample. For the input matrix, rows are samples and columns are the chemical concentrations. The concentrations are normalized, meaning the sum of each row is 1. The nmf output should exhibit closure, meaning the rows of h should sum to 1. I have been reviewing the source code and have been unable to determine if an option is available to impose closure (rows in h sum to 1). Is this implemented? if not any advice of how best to attempt this myself? I would consider myself and intermediate R user.

Thanks for you time Mike

renozao commented 10 years ago

Hi Mike,

this type of constraint is not exactly implemented in any of the built-in algorithms, but I believe there are some NMF algorithms out there that allow for this. I can think of three ways of implementing it:

x <- rmatrix(20,10)
tmp <- nmf(t(x), 3, 'lee')
res <- t(tmp)
rowSums(coef(res))
setNMFMethod('mynmf', 'Frobenius', Update  = function(i, v, x, ...){

  # add complete re-scaling here

  # return updated model
  x
}, overwrite = TRUE)

# you can now call
nmf(x, 3, 'mynmf')

Please, let me know if this helped.

Renaud

mjbock commented 10 years ago

I apologized for my late reply, my original reply got hung up in our e-mail system.

I utilized a modified approach that worked quite well. Rather than modify any of the NMF methods, I created a small procedure that scales the rows of h to one and calculates w:

PMF<-nmf(X,k,method='lee',seed='nndsvd')

w<-basis(PMF) h<-coef(PMF) h2<-sweep(h,1L,rowSums(h),"/",check.margin=FALSE) w2<-H.c %*%ginv(h2)

That works like a charm and allows me to change the NMF calculation methods and still get what I want.

An embarrassingly simple solution. Thanks, Mike

From: Renaud [mailto:notifications@github.com] Sent: Wednesday, December 11, 2013 7:06 AM To: renozao/NMF Cc: Mike Bock Subject: Re: [NMF] NMF Question (#3)

Hi Mike,

this type of constraint is not exactly implemented in any of the built-in algorithms, but I believe there are some NMF algorithms out there that allow for this. I can think of three ways of implementing it:

x <- rmatrix(20,10)
tmp <- nmf(t(x), 3, 'lee')
res <- t(tmp)
rowSums(coef(res))
setNMFMethod('mynmf', 'Frobenius', Update = function(i, v, x, ...){

# add complete re-scaling here

# return updated model
x
}, overwrite = TRUE)

# you can now call
nmf(x, 3, 'mynmf')

Please, let me know if this helped.

Renaud

On 1 December 2013 22:41, mjbock notifications@github.com<mailto:notifications@github.com> wrote:

I am hoping to use your NMF package for chemical fingerprinting. Basically chemical data from a series of samples is used to determine the end member compositions and the contribution of each end member to each sample. For the input matrix, rows are samples and columns are the chemical concentrations. The concentrations are normalized, meaning the sum of each row is 1. The nmf output should exhibit closure, meaning the rows of h should sum to 1. I have been reviewing the source code and have been unable to determine if an option is available to impose closure (rows in h sum to 1). Is this implemented? if not any advice of how best to attempt this myself? I would consider myself and intermediate R user.

Thanks for you time Mike

— Reply to this email directly or view it on GitHubhttps://github.com/renozao/NMF/issues/3 .

— Reply to this email directly or view it on GitHubhttps://github.com/renozao/NMF/issues/3#issuecomment-30315047.


This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to email@environcorp.com and immediately delete all copies of the message.

renozao commented 10 years ago

Sure. I guess you meant X instead of H.c in your sample code.

Some comments though:

mjbock commented 10 years ago

Not quite, left out a line:

X = input matrix

k=number of end memebers

PMF<-nmf(X,k,method='lee',seed='nndsvd') w<-basis(PMF) h<-coef(PMF) H.c<-w % * %h h2<-sweep(h,1L,rowSums(h),"/",check.margin=FALSE) w2<-H.c % * % ginv(h2)

So this should just be a simple re-scaling of the PMF result. I found that the results closely match those obtained using Polytopic Vector Analysis, another receptor modeling technique.However, I will experiment with adding this type of rescaling into the optimization when I get some time and need a more robust result.

Thanks for your help.

renozao commented 10 years ago

I wonder if this not equivalent to do the simpler rescaling:

d <- diag(1/rowSums(h))
h <- d %*% h
w <- w %*% 1/d

since, in matrix product, we have:

w2 = h.c h2^-1 = h.c (d h)^-1 = w h (h^-1 d^-1) = w d^-1

This is strictly true if h is invertible (which is not the case here), but does not hold in general for the generalised inverse. However, the special form of d (diagonal positive) may still make it work.

renozao commented 10 years ago

I am closing this, but feel free to add more comment on the subject.