ppsp-team / StratiPy

Graph regularized nonnegative matrix factorization (GNMF) in Python
BSD 3-Clause "New" or "Revised" License
45 stars 19 forks source link

GNMF convergene #13

Closed amansohane closed 9 years ago

amansohane commented 9 years ago

Hi, are you able to get good convergence values for GNMF factorisation with any datasets(simulated or TCGA)? norm(X - WH)

deep-introspection commented 9 years ago

Hi! The validation is still under way but results with TCGA are quite encouraging. However, diffusion seems to play a bigger role than the GNMF part in the global NBS procedure. Best, Guillaume

amansohane commented 9 years ago

Hi, thanks for reply. I have also implemented a version of this in R+python but with euclidean norm for GNMF. But I am not seeing even close convergence. I am not sure if its my implementation problem or it is like this only. But if its not an implementation problem, then i am not seeing any significant dependence on the "gamma" value, the graph regularisation parameter. As you said, the major contribution must be of diffusion. Also if this norm is not converging, doesnt it imply that the low rank approximation is not valid?

deep-introspection commented 9 years ago

That's very interesting!

Indeed, my feeling is that GNMF does not seem to add much to the NMF + Diffusion, and that gamma is not affecting so much the end decomposition. Is your code online too? I could probe this in a similar fashion so that we have a replication of the result.

By the way, what do you mean by "the low rank approximation is not valid"? I don't get how it relates to the convergence of the algorithm.

amansohane commented 9 years ago

I have to ask my employers permission for before sharing the code.

In the original Hofree et al paper, they have showed the significance of GNMF over NMF, but I am not able to replicate that. They have shown with normal NMF clustering is very poor but with GNMF good clusters come out.

By low rank approximation, what I mean is you are trying to approximate original genes X patient matrix into 2 low rank matrices which essentially the clustering. But if the norm(X-WH) is not converging to small values, then the approximation is not a good one. This is what I am worried about. I see you have implemented the KL divergence version of GNMF. So if you are getting that convergence properly, then may be I messed up somewhere.

deep-introspection commented 9 years ago

Ok, let me know then. You can email me at deep[at]introspection.eu if needed.

For the convergence, I don't have small values of norm(X-WH) neither and there is only an effect of the number the components, not the gamma.

Regarding the Hofree et al. comparison between GNMF and NMF, are you referring to the fig 2b? I actually wonder if they are applying NMF without diffusion of the mutation profile.

amansohane commented 9 years ago

Thank you. This is exactly my observation on gamma and number of clusters.

Actually I was referring to Figure 3a.(Co-clustering matrices for uterine cancer patients, comparing NBS (STRING) (top) to standard consensus clustering (bottom)).

What do you think about the high norm values in convergence? Do you think the clusters are reliable when these error norm values are huge?

deep-introspection commented 9 years ago

According to the paper: "standard consensus clustering approach not based on network knowledge (i.e., the same NBS pipeline in Fig. 1a without network smoothing and substituting NMF for NetNMF)" so basically their effect can come from the diffusion and not at all from GNMF.

I have no idea to what extend the absence of convergence would mean that the decomposition is wrong. This is indeed quite puzzling and I want now to check with simulated networks.

amansohane commented 9 years ago

Hi, I am not able to run you code in my laptop. Does it require a lot of RAM? I see that you are converting sparse matrices to dense matrices a lot of time.

deep-introspection commented 9 years ago

Hi! The current code is indeed quite demanding in RAM. If you see a way to go around that, please do not hesitate to make a pull request. There are apparently ways of optimizing GNMF regarding memory but I could not implement it yet: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0077162