Metastable decomposition of nonreversible P-matrices with Schur decomposition

franknoe commented 9 years ago

Study and consider implementing this approach:

https://opus4.kobv.de/opus4-zib/frontdoor/index/index/docId/5573

There might be something to do about the interpretation of the resulting vectors in terms of metastable sets.

marscher commented 4 years ago

@msmdev Have you started working on this now? If you need any help, I'd be happy to answer any question.

fabian-paul commented 4 years ago

Let me remark that I wrote an experimental version here. It's a minimal working version which needs some rewriting.

marscher commented 4 years ago

I think that is very useful as a starting point! Could you please elaborate a bit, which parts one would need to rewrite/optimize/fix?

msmdev commented 4 years ago

@marscher I just started to dig into Fabians code... As far as I can say at the moment, one needs to address several points. Some input from Fabian here from his point of view would be most welcome here :)

fabian-paul commented 4 years ago

Here are a few questions/decisions that come to mind:

Use interface: should g-pcca and pcca be accessed through the same function (with an optional reversible=True/False flag) or should be have two different functions? Also reversible PCCA currently contains a lot of special logic to handle absorbing states, which does not conserve the spectral properties of the T matrix and which isn't required for G-PCCA. So G-PCCA and PCCA are quite different algorithms. I'm also not sure, what's the overall support for non-reversible T matrices in msmtools. In Pyemma, support is virtually non-existant. I think in msmtools, it's better. But how much better?
There are at least two different suggestions how to handle the degenerate case where two or more eigenvalues are 1. In the current implementation I just adapted to preprocessing step from TICA to MSM to exclude the first constant eigenvectors (as documented here notes-G-PCCA.pdf ) but there is also different approach based on the QR decomposition. Not sure, what is best and if we should change it again. Maybe it's also possible to change the simplex algorithm such that it no longer requires a well defined constant eigenvector???
There are two ways to do an "ordered real Schur decomposition". Either use a standard Schur decomposition from scipy and "sort" it afterwards or implement your own Schur decomposition. I had opted for the first, because there was published code for the sorting step. Jan Brandts, the author of the sorting routine gave us permission to use it in Pyemma, presupposed we give credit to him in the documentation. I guess we can keep it that way and not write our own decomposition routine. Or do you disagree?
@msmdev has some gradient-based optimization routines for the simplex algorithm. This could be useful as well, because the current implementation scales terribly with the number of metastable states. Should this be included?
Also the code needs to be cleaned up a bit.

marscher commented 4 years ago

my 50 cents to your points:

If g-pcca is considered superior to pcca++, we should also use it for the reversible case, right? If not, we should provide a switch like you suggested, but keep both implementations separated to avoid side-effects.
dunno
I would avoid taking extra time reinventing the wheel :)
Usually the number of metastable states is low in our systems, but since we want to keep it general I'd suggest using it.
totally agree...

franknoe commented 4 years ago

I do not agree to 1. The Schur decomposition approach is for nonreversible setting, why would we replace the reversible algorithm with it?

fabian-paul commented 4 years ago

@marscher has a point. In the reversible setting (and provided that we pass in the stationary distribution to the algorithm), the Schur decomposition is just an eigendecomposition. So (except for the handling of disconnected states in the reversible implementation) G-PCCA is a true generalization of PCCA. Also I think, PCCA is typically applied to the connected set of T matrices. So G-PCCA could replace the current PCCA for most cases. Still I would be reluctant to just replace PCCA; we don't have enough practical experience with G-PCCA to do so.

marscher commented 4 years ago

Because it does not suffer from the issues of degenerated transition matrices, which has to be accounted by pcca. That's why I thought it could be considered a superior algorithm and should be chosen by default.

marscher commented 4 years ago

I agree, that we should keep both implementations in msmtools. In PyEMMA/Scikit-Time we should preferably use g-pcca in the end.

msmdev commented 4 years ago

I will write my comments below the original points for clarity:

Here are a few questions/decisions that come to mind:

Use interface: should g-pcca and pcca be accessed through the same function (with an optional reversible=True/False flag) or should be have two different functions? Also reversible PCCA currently contains a lot of special logic to handle absorbing states, which does not conserve the spectral properties of the T matrix and which isn't required for G-PCCA. So G-PCCA and PCCA are quite different algorithms. I'm also not sure, what's the overall support for non-reversible T matrices in msmtools. In Pyemma, support is virtually non-existant. I think in msmtools, it's better. But how much better?

Since G-PCCA is the true and natural generalization of PCCA there is indeed no need to use PCCA any longer. Anyway it might be true that many users could be uncomfortable at first to trust G-PCCA and might want to compare results of PCCA and G-PCCA (for reversible T). So I would advise to write two separate functions (at least at the core). Also, since PCCA needs much more complicated overhead, which is unnecessary for G-PCCA. To the outside one could wrap both functions in one caller and choose via a reversible=True/False flag at first. Alternatively/Later one could use G-PCCA as standard and offer a flag PCCA=True/False.

There are at least two different suggestions how to handle the degenerate case where two or more eigenvalues are 1. In the current implementation I just adapted to preprocessing step from TICA to MSM to exclude the first constant eigenvectors (as documented here notes-G-PCCA.pdf ) but there is also different approach based on the QR decomposition. Not sure, what is best and if we should change it again. Maybe it's also possible to change the simplex algorithm such that it no longer requires a well defined constant eigenvector???

I remember we (Fabian, Marcus and me) already discussed this via Email in 2018. There is no problem regarding degenerate 1s in G-PCCA. Also we account for the possibility that 1 might not appear explicitly in the Schur vectors via a "trick": Following Perron-Frobenius we know that the 1-vector must lie in the subspace belonging to the eigenvalue 1 (even, if it is degenerated). So there is no problem, if the eigensolver or Schur decomposition outputs a basis that doesn't contain 1 explicitly. The 1-vector can now be inserted into the basis explicitly via replacing a basis vector by 1 and making the basis orthonormal again. This stems from the fact that 1 must be contained in the basis or is at least given as a linear combination. Practically one has to ensure that, if there already is a constant vector in the basis, the constant vector is replaced by 1. Otherwise one would reduce the rank of the basis and the subspace would change.

There are two ways to do an "ordered real Schur decomposition". Either use a standard Schur decomposition from scipy and "sort" it afterwards or implement your own Schur decomposition. I had opted for the first, because there was published code for the sorting step. Jan Brandts, the author of the sorting routine gave us permission to use it in Pyemma, presupposed we give credit to him in the documentation. I guess we can keep it that way and not write our own decomposition routine. Or do you disagree?

I used the first alternative: Even it is a little slow for large matrices, I would (at least at first) keep it this way.

@msmdev has some gradient-based optimization routines for the simplex algorithm. This could be useful as well, because the current implementation scales terribly with the number of metastable states. Should this be included?

I guess you mean for the optimization of the objective (which is the difference between a simplex algorithm and PCCA...)? I used a damped, global constrained (or unconstrained) Gauss-Newton method with error oriented convergence criterion called NLSCON (from Deuflhard). It is implemented in Fortran 77 or MATLAB, but not in Python. To get it into Python might become quite some work. Mb. one could use the Fortran code, but I never tested it, since i used the MATLAB implementation. The use of fminsearch for sure is prohibitively slow for larger systems... An alternative would be to formulate the optimization problem as a bilinear program (which would be best and natural), but for this we would need to ask Marcus Weber for help and it is tedious work.

Also the code needs to be cleaned up a bit.

I couldn't get really deep into it by now, but as far as I saw, it would need to be mostly rewritten, if we want to keep the simple structure of the G-PCCA algorithm. The current implementation seems to be to close to the PCCA implementation, which makes it more complicated... This is also why I think that it is necessary to separate it from PCCA in an own function.

msmdev commented 4 years ago

I do not agree to 1. The Schur decomposition approach is for nonreversible setting, why would we replace the reversible algorithm with it?

@franknoe it is quite natural to use G-PCCA (and Schur) for both reversible and nonreversible cases, since the Schur decomposition simply reduces to the spectral decomposition in the reversible case (more precisely for irreducible, symmetric and positive-definite Markov chains).

marscher commented 4 years ago

We can use this (tested?) Fortran code and bridge it with f2py, when there are no volunteers to reformulate the optimization problem. I could provide this bridge, when you push the code somewhere. It is then usable from Python with no headache.

msmdev commented 4 years ago

@marscher I haven't tested the Fortran code, but used the MATLAB variant, which seems to be a quite direct translation of the Fortran original. Still we could bridge to Fortran and run some tests (I have several test cases...) to check its sanity. I'm still working on the G-PCCA implementation in MSMTools. Hopefully, I will finish this within the next weeks (I'm currently a little busy - I switched to a new PostDoc position in a machine learning / bioinformatics group in Tübingen since October). Afterwards (after merging), we can tackle the Fortran implementation and your help would be very welcome!

Marius1311 commented 4 years ago

Hi all, I would be really interested in trying this out, is there any progress on implementing G-PCCA in msmtools? Or is there any other python implementation available?

fabian-paul commented 4 years ago

Hi Marius, here is a first version: https://github.com/fabian-paul/msmtools/blob/gpcca/msmtools/analysis/dense/pcca.py#L394 I think it's functional, but @msmdev planned to improve it further. Not sure if there is any interest in continuing the development of PCCA. Your help would be greatly appreciated by the community. Fabian

msmdev commented 4 years ago

Hi, Marius! This is still work in progress (at least from my side...) and currently very untidy. I took a look on Fabians implementation, but decided to start again from scratch for most parts. I started, but did't make a lot of progress, since I have a lot on my list. The Matlab implementation on the other side is fully functional (https://github.com/msmdev/gpcca). At some point I will port it to python anyway, but currently I have a important deadline for in May - so I think I won't make a lot of progress till then. If it is urgent and you would be interested to help, we could discuss how to accelerate this and try split some tasks. Best, Bernhard

msmdev commented 4 years ago

Ok, I took a look on what I already have and it isn't so bad. I will try to invest a few hours on the weekend and see how close to functional this will get the code... Good you asked and thus reminded me :)

Marius1311 commented 4 years ago

Wow, thanks a lot, that was fast! I will check out what @fabian-paul posted and get back to you! Maybe that already does the job for me. If not, would be happy to get involved!

msmdev commented 4 years ago

As far as I remember there was some issue with the implementation of @fabian-paul, but I currently can't put my finger on it... I hope to get back into this on the weekend.

fabian-paul commented 4 years ago

I think @msmdev had a better (simpler) idea than me for how to handle the case of degenerate eigenvalues (repeated eigenvalues that are typically 1). This situation can come up in disconnected Markov models.

Marius1311 commented 4 years ago

Hi all, we are still experimenting with @fabian-paul 's gpcca implementation - are there any news from @msmdev regarding your implementation? I also had a general question regarding scalability: there is no sparse python implementation of the real schur decomposition, is there? The problems we would like to apply your tools to are quite large (~1k - 100k states)

fabian-paul commented 4 years ago

Hi Marius, I tried my own implementation with 10k states. This worked fine on a workstation with 8 GB memory (possibly less RAM would also suffice). That's also why there are so many scattered del commands in my implementation. I'm not aware of a sparse Schur decomposition. You might also ask Bernhard or Marcus. I remember that they had come up with an equivalent formulation of GPCCA that does not need the schur decomposition. Probably this part of the conversation should be continued by email. One more comment: do you really need 100k Markov states? Wouldn't it be near to impossible to generate enough sampling to populate such a huge matrix? Maybe a clever choice of basis functions or some dimension reduction would eliminate that need.

msmdev commented 4 years ago

@Marius1311 I made good progress this weekend. Hope to be able to serve a working implementation by middle or end of the week :)

msmdev commented 4 years ago

@Marius1311 The main problem is not the Schur decomposition, but the sorting of it. This is currently the bottleneck. There is a way to speed this up significantly, but there is no implementation whatsoever of this as far as I know... I totally agree with @fabian-paul : It doesn't appear sane to me to use up to 100k states from statistical point of view (you would need tremendous amounts of data...). You should choose appropriate basis functions or preprocess the raw data via dimension reduction methods.

Marius1311 commented 4 years ago

Thanks @msmdev for the update!

Marius1311 commented 4 years ago

Hi @fabian-paul , when trying to run your gpcca code, I get an import error: import msmtools.analysis.dense.pcca as p raises

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-17-28d338aa3934> in <module>
----> 1 import msmtools.analysis.dense.pcca as p

~/Projects/msmtools/msmtools/__init__.py in <module>
     40 
     41 from . import analysis
---> 42 from . import estimation
     43 from . import generation
     44 from . import dtraj

~/Projects/msmtools/msmtools/estimation/__init__.py in <module>
     90 """
     91 from __future__ import absolute_import
---> 92 from .api import *

~/Projects/msmtools/msmtools/estimation/api.py in <module>
     38 from scipy.sparse import issparse
     39 from scipy.sparse.sputils import isdense
---> 40 from . import dense
     41 from . import sparse
     42 

~/Projects/msmtools/msmtools/estimation/dense/__init__.py in <module>
     22 from . import transition_matrix
     23 from . import covariance
---> 24 from . import mle_trev
     25 from . import mle_trev_given_pi
     26 from . import tmatrix_sampler

ImportError: cannot import name 'mle_trev'

> /Users/marius/Projects/msmtools/msmtools/estimation/dense/__init__.py(24)<module>()
     22 from . import transition_matrix
     23 from . import covariance
---> 24 from . import mle_trev
     25 from . import mle_trev_given_pi
     26 from . import tmatrix_sampler

fabian-paul commented 4 years ago

That's unrelated to PCCA and you could avoid the bug by commenting out that line. mle_trev is a compiled module and probably you have to recompile (python setup.py install). Also, possibly my fork is a bit outdated and not fully compatible with the current development branch of msmtools. I'll look into that.

Marius1311 commented 4 years ago

Thanks! I commented out that line and a couple more imports that caused bugs once that first line was commented out and now it runs.

Marius1311 commented 4 years ago

So here are some first results: Fabian's implementation works for us, and gives quite nice results already. In practice, we may often have the of a degenerate 1-eigenvalue, so your implementation @msmdev would be still very valuable!

msmdev commented 4 years ago

@Marius1311 Nice to hear that you like G-PCCA! This motivates me to finish the official implementation! :) Please don't forget to cite us ;) (Reuter, B., Weber, M., Fackeldey, K., Röblitz, S., & Garcia, M. E. (2018). Generalized Markov State Modeling Method for Nonequilibrium Biomolecular Dynamics: Exemplified on Amyloid β Conformational Dynamics Driven by an Oscillating Electric Field. Journal of Chemical Theory and Computation, 14(7), 3579–3594. https://doi.org/10.1021/acs.jctc.8b00079)

Marius1311 commented 4 years ago

Thanks @msmdev for pointing me to the paper, I will definitely cite you, I really appreciate your help!

Marius1311 commented 4 years ago

I meanwhile profiled @fabian-paul 's gpcca implementation on a large problem (20k states) and by far the most expensive computation was the computation of the schur decomposition (> 1/2 of the computation time), see the screenshot below. Are there ways to speed this up?

msmdev commented 4 years ago

@Marius1311 You could compare it with the time needed by Matlab to perform the Schur decomposition (I could imagine that the Matlab implementation is better optimized), but besides optimization I don't see a way to speed this up significantly. Anyway, I think this is an acceptable computation time - compared for example with the time needed to solve the eigenvalue problem for the same matrix. Just out of interest: Into how many macrostates did you cluster (since the sorting was going so fast...)?

Marius1311 commented 4 years ago

Hi @msmdev , I agree, it's an acceptable computation time, but the way I understand it, we really only need a subset of the Schur decomposition, yet we we have to compute the whole one, unfortunately. Since my matrix is quite sparse, I can compute the top ~10-20 eigenvectors in ~20 seconds on my laptop (for the same 20k x 20k matrix). That's why PCCA is super fast for me - the only problem being that my matrix isn't reversible.

I only went for 5 macrostates.

msmdev commented 4 years ago

@Marius1311 Hi Marius, I see your point here... Actually, it is indeed possible to perform a partial Schur decomposition, but since you need to sort it at the same time it is a little more intricate. There is a paper from Brandts (from were we took the sorting algorithm for the whole Schur form), were he also outlines a combined partial Schur decomposition and sorting algorithm: Brandts, J. H. (2002). Matlab code for sorting real Schur forms. Numerical Linear Algebra with Applications, 9(3), 249–261. https://doi.org/10.1002/nla.274 (public available: https://www.researchgate.net/publication/227793922_Matlab_code_for_sorting_real_Schur_forms) Sadly, there is so far no implementation of it... and I don't have the time to change that soon, but any contribution on that side would be very welcome!

msmdev commented 4 years ago

@Marius1311 Regarding the 20k you are clustering into only five macrostates: I really think you could benefit from some preprocessing to gain a smaller matrix... Consequently, the Schur decomposition even of the whole matrix wouldn't be so costly any more. Mb, we could discuss your problem/application in some more detail, but since this is a little of topic for this chat we should do this via email (bernhard.reuter@uni-tuebingen.de).

axsk commented 4 years ago

Hello all, Marcus Weber pointed me to this discussion. From my experience the sorting is actually not necessary beyond for getting the dominant Schur vectors. Since this is possible with a partial Schur decomposition/factorization (by targeting the leading n eigenvalues) I think that this is the way to go. ARPACK and SLEPc seem to offer implementations with Python bindings, c.f. https://slepc.upv.es/documentation/reports/str7.pdf

All the best, Alexander

msmdev commented 4 years ago

@axsk Hi Alex, thx for the input. Yes we actually only sort the leading Schur vectors, but currently we need to perform the whole Schur decomposition before the sorting... So a sorted partial decomposition is indeed what we are looking for. Do you have some working code example for partial sorted Schur decomposition of the dominant Schur vectors (belonging to the leading eigenvalues) with ARPACK or SLEPc that you can provide?

axsk commented 4 years ago

I have worked with GPCCA / the Schur version of PCCA+ since my bachelor thesis 2014/15 (which sadly never got cited :( ). Since everyone in CMD working group was running his own copy of PCCA+/GPCCA, we decided to bundle and maintain a generic implementation into a lightweight Python package together with other tools core to the transfer operator approaches developed in our group.

You can find it in this repo Although it is an early stage and the API may still be subject to change, the GPCCA implementation seems to work just fine.

I just added support for the Krylov-Schur method from SLEPc in a seperate branch. Note that you need the package slepc4py, which for me took a few minutes to compile.

For simple usage examples have a look in the test file. An accompanying technical report specifying the algorithmic details is work in progress.

Whilst I do see the similarity between the packages I hope you don't understand it as an attempt to highjack your efforts. With GPCCA being at the core of our groups work we require a library written from our take on the subject. In the hope of avoiding unnecessary rivalry I would be glad to adapt the API to your needs, facilitating usage of our GPCCA implementation in your project.

msmdev commented 4 years ago

@axsk I will take a look at it regarding Krylov-Schur. Anyway I already finished the G-PCCA implementation for MSMTools. Currently I am just running the test und thus will push them the next days...

Marius1311 commented 4 years ago

Hi all, thanks @axsk for the input - I will check out your implementation. Thanks also @msmdev, I'm curious to see your implementation as well!

marscher commented 3 years ago

https://github.com/msmdev/pyGPCCA is finally public, congrats @msmdev

@clonker fyi, eventually this should be used in deeptime-ml, when creating non-rev MSMs as the algorithm of choice during coarse-graining.

msmdev commented 3 years ago

@marscher @clonker we are currently finalizing the documentation and will soon deploy on PyPI and conda, so pyGPCCA will be ready to be used. Btw, GPCCA is - in my opinion - not only the method of choice to coarse-grain non-reversible MSMs, but also for reversible MSMs, since it is the natural generalization of PCCA+. Further, pyGPCCA is much faster than PCCA+ when coarse-graining huge matrices, if one uses method='krylov' as method to compute and sort the Schur decompostion. To do so, one has to install SLEPc (see here), but the speedup is huge. Otherwise, one can rely on scipy and the Brandts sorting algorithm (see here) to do the job (method='brandts', just slower).

clonker commented 3 years ago

This is great, congrats indeed @msmdev! I like that you kept SLEPc as an optional dependency, makes integrating it so much easier. Looking forward to your package and documentation!

axsk commented 3 years ago

@msmdev Considering that most of the SLEPc code originates from my repository, I would appreciate a mention in the contributor section. Thank you very much :) Alexander Sikorski

msmdev commented 3 years ago

@msmdev Considering that most of the SLEPc code originates from my repository, I would appreciate a mention in the contributor section. Thank you very much :) Alexander Sikorski

@axsk thx for pointing out that you pointed us to SLEPc (as we mentioned in the acknowledgements of our cellrank preprint here. The actual code integrating SLEPc in pyGPCCA was written by @michalk8, @Marius1311 and me, but you undoubtedly gave us some inspirations were to start. Fell free to pm me, if you like to discuss further mentions.

axsk commented 3 years ago

I am not sure if a copy and paste of my code (compare https://github.com/msmdev/msmtools/commit/86600db5cf92e5ae6d1dc6844af97d7b372fb4a7 to my code), after requesting that code from me is merely an inspiration.

However saying that "you surely didn't contribute anything to _gpcca." (c.f. my PR) certainly goes to far.

Please reconsider your stance on this issue.

msmdev commented 3 years ago

Alex, as I mentioned in your PR I am willing to include appropriate mentions in the related part of pyGPCCA (in _sorted_schur), but not in the main code _gpcca. I think this is absolutely appropriate - also our code is surely not just a copy and paste and the part which is similar is following not only your example but also the SLEPc documentation. As I told you in your pull request: you can supply us with an appropriate wording, but you cannot press us to include you in every header of code that isn't even related to incorporating SLEPc. Also this (here) isn't the right place to discuss this.

msmdev commented 3 years ago

I am not sure if a copy and paste of my code (compare msmdev@86600db to my code), after requesting that code from me is merely an inspiration.

However saying that "you surely didn't contribute anything to _gpcca." (c.f. my PR) certainly goes to far.

Please reconsider your stance on this issue.

You should better compare _sorted_schur in pyGPCCA, since the commit you mention was a first trial in a fork of msmtools and developed substantially since then. Thus your code was indeed a first inspiration, but is not identical ("...copy and paste...") to our implementation in pyGPCCA as you claim. This would be fair.

markovmodel / msmtools

Metastable decomposition of nonreversible P-matrices with Schur decomposition #56