mjbock commented 10 years ago

I am hoping to use your NMF package for chemical fingerprinting. Basically chemical data from a series of samples is used to determine the end member compositions and the contribution of each end member to each sample. For the input matrix, rows are samples and columns are the chemical concentrations. The concentrations are normalized, meaning the sum of each row is 1. The nmf output should exhibit closure, meaning the rows of h should sum to 1. I have been reviewing the source code and have been unable to determine if an option is available to impose closure (rows in h sum to 1). Is this implemented? if not any advice of how best to attempt this myself? I would consider myself and intermediate R user.

Thanks for you time Mike

renozao commented 10 years ago

Hi Mike,

this type of constraint is not exactly implemented in any of the built-in algorithms, but I believe there are some NMF algorithms out there that allow for this. I can think of three ways of implementing it:

apply the 'Frobenius' (or 'lee' if used for clustering) to the transposed problem, with rescale = TRUE (as default), this imposes that the columns of W sum up to one and transpose the result:

x <- rmatrix(20,10)
tmp <- nmf(t(x), 3, 'lee')
res <- t(tmp)
rowSums(coef(res))

if this does not give you what you want, you could modify of any NMF algorithm that scales the rows of H to sum up to one, and apply the inverse scaling to the columns of W, which would not change the actual objective value: W H = W D^-1 D H. So you define your own update rule, e.g., based on the function nmf_update.lee_R, and define a new algorithm with it:

setNMFMethod('mynmf', 'Frobenius', Update  = function(i, v, x, ...){

  # add complete re-scaling here

  # return updated model
  x
}, overwrite = TRUE)

# you can now call
nmf(x, 3, 'mynmf')

add this "soft" constraint via a regularisation term which penalises having the sum of H far from 1. This requires to modify the update rules for the entries of H.
add a constraint to the optimisation problem, and solve it as a constrained optimisation problem. This also requires to modify the update rules for the entries of H.

Please, let me know if this helped.

Renaud

mjbock commented 10 years ago

I apologized for my late reply, my original reply got hung up in our e-mail system.

I utilized a modified approach that worked quite well. Rather than modify any of the NMF methods, I created a small procedure that scales the rows of h to one and calculates w:

PMF<-nmf(X,k,method='lee',seed='nndsvd')

w<-basis(PMF) h<-coef(PMF) h2<-sweep(h,1L,rowSums(h),"/",check.margin=FALSE) w2<-H.c %*%ginv(h2)

That works like a charm and allows me to change the NMF calculation methods and still get what I want.

An embarrassingly simple solution. Thanks, Mike

From: Renaud [mailto:notifications@github.com] Sent: Wednesday, December 11, 2013 7:06 AM To: renozao/NMF Cc: Mike Bock Subject: Re: [NMF] NMF Question (#3)

Hi Mike,

this type of constraint is not exactly implemented in any of the built-in algorithms, but I believe there are some NMF algorithms out there that allow for this. I can think of three ways of implementing it:

apply the 'Frobenius' (or 'lee' if used for clustering) to the transposed problem, with rescale = TRUE (as default), this imposes that the columns of W sum up to one and transpose the result:

x <- rmatrix(20,10)
tmp <- nmf(t(x), 3, 'lee')
res <- t(tmp)
rowSums(coef(res))

if this does not give you what you want, you could modify of any NMF algorithm that scales the rows of H to sum up to one, and apply the inverse scaling to the columns of W, which would not change the actual objective value: W H = W D^-1 D H. So you define your own update rule, e.g., based on the function nmf_update.lee_R, and define a new algorithm with it:

setNMFMethod('mynmf', 'Frobenius', Update = function(i, v, x, ...){

# add complete re-scaling here

# return updated model
x
}, overwrite = TRUE)

# you can now call
nmf(x, 3, 'mynmf')

add this "soft" constraint via a regularisation term which penalises having the sum of H far from 1. This requires to modify the update rules for the entries of H.
add a constraint to the optimisation problem, and solve it as a constrained optimisation problem. This also requires to modify the update rules for the entries of H.

Please, let me know if this helped.

Renaud

On 1 December 2013 22:41, mjbock notifications@github.com<mailto:notifications@github.com> wrote:

I am hoping to use your NMF package for chemical fingerprinting. Basically chemical data from a series of samples is used to determine the end member compositions and the contribution of each end member to each sample. For the input matrix, rows are samples and columns are the chemical concentrations. The concentrations are normalized, meaning the sum of each row is 1. The nmf output should exhibit closure, meaning the rows of h should sum to 1. I have been reviewing the source code and have been unable to determine if an option is available to impose closure (rows in h sum to 1). Is this implemented? if not any advice of how best to attempt this myself? I would consider myself and intermediate R user.

Thanks for you time Mike

— Reply to this email directly or view it on GitHubhttps://github.com/renozao/NMF/issues/3 .

— Reply to this email directly or view it on GitHubhttps://github.com/renozao/NMF/issues/3#issuecomment-30315047.

This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to email@environcorp.com and immediately delete all copies of the message.

renozao commented 10 years ago

Sure. I guess you meant X instead of H.c in your sample code.

Some comments though:

you have to be aware that it is unlikely that the factorization you get (w2, h2) is a -- local -- optimum of the constrained optimisation problem you want to solve. In this case, w2 is optimal for the sub-problem min_w |x
- wh2| given h2, but probably not for the complete problem min_{w, h} |x - wh|. Moreover, you may even get negative values in w2.
even the rescaling I suggested (within the update step), and is commonly performed in NMF algorithms, sort of messes up with the optimality conditions, although it does not affect the objective value per se.

mjbock commented 10 years ago

Not quite, left out a line:

X = input matrix

k=number of end memebers

PMF<-nmf(X,k,method='lee',seed='nndsvd') w<-basis(PMF) h<-coef(PMF) H.c<-w % * %h h2<-sweep(h,1L,rowSums(h),"/",check.margin=FALSE) w2<-H.c % * % ginv(h2)

So this should just be a simple re-scaling of the PMF result. I found that the results closely match those obtained using Polytopic Vector Analysis, another receptor modeling technique.However, I will experiment with adding this type of rescaling into the optimization when I get some time and need a more robust result.

Thanks for your help.

renozao commented 10 years ago

I wonder if this not equivalent to do the simpler rescaling:

d <- diag(1/rowSums(h))
h <- d %*% h
w <- w %*% 1/d

since, in matrix product, we have:

w2 = h.c h2^-1 = h.c (d h)^-1 = w h (h^-1 d^-1) = w d^-1

This is strictly true if h is invertible (which is not the case here), but does not hold in general for the generalised inverse. However, the special form of d (diagonal positive) may still make it work.

renozao commented 10 years ago

I am closing this, but feel free to add more comment on the subject.

renozao / NMF

NMF Question #3

X = input matrix

k=number of end memebers