markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
307 stars 118 forks source link

Continuous time MSM #951

Open MamtaMohan opened 7 years ago

MamtaMohan commented 7 years ago

Hi,

Is it possible to generate continuous time MSM with pyEMMA.

Mamta

franknoe commented 7 years ago

This has been on our lists, but we hadn't had the time to finish it yet. @fabian-paul has implemented rate matrix estimator using the Kalbfleisch-Lawless method in a pull request here: https://github.com/markovmodel/msmtools/pull/57 But I'm not sure what the status is, Fabian can you comment? Maybe having the estimator in msmtools would already help you, Mamta? My feeling is that this would be easy to do.

I think it would also be relatively straightforward to wire it into PyEMMA - we'd need to implement a Model class (MasterEquationModel or ContinuousTimeMSM) that is very similar to the current MSM class but has a few changes with respect to how the eigenvalues are handled. I think we could re-use all MSM tests. The estimator would be similar to MaximumLikelihoodMSM and would just call the msmtools estimator. Probably much of the code in MSM and MaximumLikelihoodMSM can be generalized to be used in both cases. My guess is that this is a few days of coding for a good PyEMMA developer. Fabian, have you already done any of that or did you exclusively work with the low-level method?

Am 01/10/16 um 20:45 schrieb MamtaMohan:

Hi,

Is it possible to generate continuous time MSM with pyEMMA.

Mamta

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/951, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQiQzYIDN25SJtosp4Urq4kIe3AOGks5qvwyigaJpZM4KL6L3.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

MamtaMohan commented 7 years ago

Thank you.

Appreciate your response.

I will give a try.

Mamta

fabian-paul commented 7 years ago

Hi Mamta, hi Frank,

the continuous time estimator that is implemented in the pull request markovmodel/msmtools#57 is functional. To use the estimator as quick as possible, you could replace your version of msmtools with the version from my branch: https://github.com/fabian-paul/msmtools/tree/ratematrix This version is quite up-to-date with the development branch.

The rate-matrix estimator is available after importing msmtools as msmtools.estimation.estimate_rate_matrix It is not integrated with Pyemma yet, estimate_rate_matrix takes a (numpy) count matrix as input and returns a (numpy) rate-matrix. Integrating this in Pyemma will require a bit of work, as we would have to provide the implied time-scale plot, the Chapman-Kolmogorov test and Transition Path Theory expressed with the rate matrix as well. I'm would be pleased if we could add these features. However I'm quite busy right now so I would prefer to implement only the functionality that is needed right now. I vigorously recommend that you carry out the Chapman-Kolmogorov-test with the estimated rate matrix because is it not guaranteed a priori that a continuous time MSM even exists for your data. So I think it would be a sensible goal to implement everything in Pyemma that is needed for a continuous time MSM CK-test. Mamta, to you need any specific functionality for the continuous time MSM?

Best, Fabian

franknoe commented 7 years ago

most of these additional features (Implied Timescales and CKTest) would come for free if an Estimator and Model class are implemented, so I think it's less work than you suggest, but of course I see you can't do it right now.

Can we merge the PR in msmtools to have the basic functionality and discuss the rest in Berlin?

Am 03/10/16 um 04:19 schrieb fabian-paul:

Hi Mamta, hi Frank,

the continuous time estimator that is implemented in the pull request markovmodel/msmtools#57 https://github.com/markovmodel/msmtools/pull/57 is functional. To use the estimator as quick as possible, you could replace your version of msmtools with the version from my branch: https://github.com/fabian-paul/msmtools/tree/ratematrix This version is quite up-to-date with the development branch.

The rate-matrix estimator is available after importing msmtools as |msmtools.estimation.estimate_rate_matrix| It is not integrated with Pyemma yet, estimate_rate_matrix takes a (numpy) count matrix as input and returns a (numpy) rate-matrix. Integrating this in Pyemma will require a bit of work, as we would have to provide the implied time-scale plot, the Chapman-Kolmogorov test and Transition Path Theory expressed with the rate matrix as well. I'm would be pleased if we could add these features. However I'm quite busy right now so I would prefer to implement only the functionality that is needed right now. I vigorously recommend that you carry out the Chapman-Kolmogov-test with the estimated rate matrix because is it not guaranteed a priori that a continuous time MSM even exists for your data. So I think it would be a sensible goal to implement everything in Pyemma that is needed for a continuous time MSM CK-test. Mamta, to you need any specific functionality for the continuous time MSM?

Best, Fabian

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/951#issuecomment-251061756, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQhLmSuhZ98cV9AhjmCAEYo2f7QwAks5qwMi7gaJpZM4KL6L3.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 7 years ago

I see. If we implement it as a Model with .timescales() and .propagate() we get much for free.

Yes the PR is ready to merge. It was on halt for two reasons:

franknoe commented 7 years ago

I agree. Let's go ahead, we can always swap out code if useful, but for now it's important to have the functionality.

Am 03/10/16 um 14:26 schrieb fabian-paul:

I see. If we implement it as a Model with .timescales() and .propagate() we get much for free.

Yes the PR is ready to merge. It was on halt for two reasons:

  • We were waiting for my PR on scipy to make it into the next release. That has happened and I have removed all duplicate code that now is in scipy from the PR.
  • We wanted to clarify whether my or Robert's approach works better. We didn't sort this out yet. But I don't think this should stop us from merging. If Robert's method turns out the be better, we can change that later.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/951#issuecomment-251201373, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQvShWgiDGHWoj_G7EC9zf7x8wA2Pks5qwVbNgaJpZM4KL6L3.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe commented 7 years ago

Matma, the msmtools code is merged. You can find the low-level functionality in msmtools.estimation.rate_matrix if you install from github. If you want to use the released version, wait a few days until the new msmtools release is cut.

marscher commented 7 years ago

@MamtaMohan you can now use msmtools-1.2 for rate matrix estimation

MamtaMohan commented 7 years ago

Thank you Frank.

I will wait few days.

Mamta


From: Frank Noe notifications@github.com Sent: Friday, October 21, 2016 1:24:29 PM To: markovmodel/PyEMMA Cc: Mohan, Mamta; Author Subject: Re: [markovmodel/PyEMMA] Continuous time MSM (#951)

Matma, the msmtools code is merged. You can find the low-level functionality in msmtools.estimation.rate_matrix if you install from github. If you want to use the released version, wait a few days until the new msmtools release is cut.

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/markovmodel/PyEMMA/issues/951#issuecomment-255429878, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APcc_Yc_ztnFxcOdhhqRSFaXdHwGnTCPks5q2PVNgaJpZM4KL6L3.

marscher commented 7 years ago

the binaries the released version are also available ;)

just do a conda update msmtools

MamtaMohan commented 7 years ago

Thank you Martin.

I was waiting for conda release.

Appreciate it.

Mamta


From: Martin K. Scherer notifications@github.com Sent: Tuesday, October 25, 2016 3:01:19 PM To: markovmodel/PyEMMA Cc: Mohan, Mamta; Mention Subject: Re: [markovmodel/PyEMMA] Continuous time MSM (#951)

the binaries the released version are also available ;)

just do a conda update msmtools

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/markovmodel/PyEMMA/issues/951#issuecomment-256142554, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APcc_VyrADVjBXMD7eEYHR1lg5rxNMpuks5q3lH_gaJpZM4KL6L3.

MamtaMohan commented 7 years ago

Dear Fabian,

I was caught up with things so writing to after a delay.

I am new to lot of things so please be patient with me.

I just want basic Continuous Markov Model, functionality for it there now.

Please correct me where ever you think I am off.

To make Continuous Markov Model, Using PentaPeptide test files and tutorial and following script I build the model up to the point to calculate Implied timescales Bayesian Markov state model estimator, which give estimation of msm-lag time. Then discrete Markov model is build by call: M = msm.estimate_markov_model(dtrajs, msm_lag) I understand here you would like me to use new rate matrix, first by importing msmtools.

But Isn't msmtools is already loaded with PyEmma?

As per you suggested : The rate-matrix estimator is available after importing msmtools as msmtools.estimation.estimate_rate_matrix

I tried [83]: import msmtools as msmtools.estimation.estimate_rate_matrix File "", line 1 import msmtools as msmtools.estimation.estimate_rate_matrix ^ So I imported everything from msmtools

import msmtools as MT

In [87]: MC = MT.estimation.estimate_rate_matrix(dtrajs, msm_lag)

AttributeError Traceback (most recent call last)

in () ----> 1 MC = MT.estimation.estimate_rate_matrix(dtrajs, msm_lag) AttributeError: 'module' object has no attribute 'estimate_rate_matrix' Last command I think is part of PyEMMA so it is not suppose to work. So could you please guide me how to import the required module. If you can point out calls in tutorial It will help.
franknoe commented 7 years ago

Hi,

you may need to update msmtools: conda install msmtools

then your import statement should work as this: from msmtools.estimation import estimate_rate_matrix and use as estimate_rate_matrix(args)

or import msmtools and use as msmtools.estimation import estimate_rate_matrix()

Am 16/11/16 um 02:50 schrieb MamtaMohan:

import msmtools as msmtools.estimation.estimate_rate_matrix


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

MamtaMohan commented 7 years ago

Dear Frank,

I will try your suggestion again as you mentioned.

I updated PyEMMA before I started via conda update and following packages were updated:

package build
progress_reporter-1.2 py_1 14 KB acellera
conda-4.2.12 py27_0 373 KB
msmtools-1.2 np111py27_0 1.2 MB acellera
pyemma-2.2.7 np111py27_0 1.3 MB acellera
------------------------------------------------------------ 
                                       Total:         2.8 MB 

The following NEW packages will be INSTALLED:

progress_reporter: 1.2-py_1          acellera                      

The following packages will be UPDATED:

conda:             4.2.7-py27_0                                     --> 4.2.12-py27_0             
msmtools:          1.1.3-np111py27_0 http://conda.binstar.org/omnia --> 1.2-np111py27_0   acellera 
pyemma:            2.2.3-np111py27_0 http://conda.binstar.org/omnia --> 2.2.7-np111py27_0 acellera 
franknoe commented 7 years ago

then I guess you should have an msmtools that includes the feature

Am 16/11/16 um 13:43 schrieb MamtaMohan:

Dear Frank,

I will try your suggestion again as you mentioned.

I updated PyEMMA before I started via conda update and following packages were updated:

package build progress_reporter-1.2 py_1 14 KB acellera conda-4.2.12 py27_0 373 KB msmtools-1.2 np111py27_0 1.2 MB acellera pyemma-2.2.7 np111py27_0 1.3 MB acellera

|------------------------------------------------------------ Total: 2.8 MB |

The following NEW packages will be INSTALLED:

|progress_reporter: 1.2-py_1 acellera |

The following packages will be UPDATED:

|conda: 4.2.7-py27_0 --> 4.2.12-py27_0 msmtools: 1.1.3-np111py27_0 http://conda.binstar.org/omnia --> 1.2-np111py27_0 acellera pyemma: 2.2.3-np111py27_0 http://conda.binstar.org/omnia --> 2.2.7-np111py27_0 acellera |

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/951#issuecomment-260937178, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQqZBb7283L1ntQ7A8X-FQ4KRPiUEks5q-vp3gaJpZM4KL6L3.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 7 years ago

Hi Mamta,

The comments above are a bit out-dated. We renamed the method in msmtools to msmtools.estimation.rate_matrix. Could you give that a try? Also msmtools doens't work with dtrajs directly (like pyemma). You first have to estimate a count matrix with msmtools.estimation.count_matrix and restrict that to its connected set with msmtools.estimation.largest_connected_submatrix. That submatrix has to be given to msmtools.estimation.rate_matrix as the first argument.

Best, Fabian

MamtaMohan commented 7 years ago

ok.

Let me give it a try and let me get back to you.

marscher commented 7 years ago

On 11/16/2016 03:46 AM, Frank Noe wrote:

Hi,

you may need to update msmtools: conda install msmtools

then your import statement should work as this: from msmtools.estimation import estimate_rate_matrix The function is just called rate_matrix, in analogy to the function transition_matrix.

To easily get a correct (connected count matrix) you can first estimate an msm via pyemma and re-use the count_matrix_active property:

msmtools.estimation.rate_matrix(msm.count_matrix_active)

franknoe commented 7 years ago

Yes, correct.

Note that estimation of the rate matrix on a fine clustering that MSMs are built upon is very expensive and not necessarily very meaningful. It's a better idea to use a HMM and then do rate matrix estimation on the resulting macroscopic count matrix.

Am 17/11/16 um 13:40 schrieb Martin K. Scherer:

On 11/16/2016 03:46 AM, Frank Noe wrote:

Hi,

you may need to update msmtools: conda install msmtools

then your import statement should work as this: from msmtools.estimation import estimate_rate_matrix The function is just called rate_matrix, in analogy to the function transition_matrix.

To easily get a correct (connected count matrix) you can first estimate an msm via pyemma and re-use the count_matrix_active property:

msmtools.estimation.rate_matrix(msm.count_matrix_active)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/951#issuecomment-261236995, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQito0es5i0knR86mO07yvLGWEQlHks5q_EtNgaJpZM4KL6L3.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

MamtaMohan commented 7 years ago

Thank you.

Hopefully I will be able to get this today or tomorrow and I will get back to you.

MamtaMohan commented 7 years ago

Sorry for the delay.

This is what I understand from the discussion:

For Continuous Markov Model: After I build discrete Markov Model via PyEMMA I should first build HMM as Frank suggested. I did make HMM for 4 states for Penta-peptide but I realized As Martim suggested for rate_matrix I imported msmtools as: import msmtools as msmtools I checked Markov model which is in M. M.count_matrix_active give me result: In [126]: M.count_matrix_active Out[126]: array([[ 6.91231723, 0. , 0. , ..., 1.97494778,

  1. , 9.8747389 ],
    1. , 4.95939945]])

Just listing part of output. In [129]: M.count_matrix_active.shape Out[129]: (250, 250) Finally to get rate_matrix: In [130]: MMC = msmtools.estimation.rate_matrix(M.count_matrix_active) ...... AssertionError: Commands ends with AssertionError. I think it is this function: def function_and_gradient(self, x): 343 if self.sparsity is None: --> 344 assert np.all(x >= 0) 345 else: 346 assert np.all(x > 0)

If you can help me fix this I would appreciate it.

fabian-paul commented 7 years ago

Hi Mamta,

apologies for the long delay. I had some time to dig into this error only now. I created a pull request with a tentative fix on https://github.com/markovmodel/msmtools/pull/98 There seems to be some internal problem of the l_bfgs_b routine as it apparently can violate the constraints that are imposed during the minimization of the likelihood. The fix would be to install a new version of msmtools from github once the pull request is merged and call rate_matrix with the option on_error='warn'. This will turn the error that you get into a warning. Hopefully then the l_bfgs_b minimizer will recover from overshooting the bounds in a subsequent iteration, s. t. the result will be ok. I know that this might not be the solution you expect. However there are still some numerical difficulties with the maximum-likelihood rate matrix estimator. The gradient of the likelihood requires an element-wise division by the transition matrix which leads to a bad condition. In a previous commit, I had already implemented the ad-hoc fix that was proposed by Pande and McGibbon for this problem in DOI:10.1063/1.4926516 This might already solve the problem.

Fabian

MamtaMohan commented 7 years ago

Dear Fabian,

Thank you for your response.

I will try the fix.

If you want I can wait a bit.

Mamta


From: fabian-paul notifications@github.com Sent: Monday, January 02, 2017 1:01:10 PM To: markovmodel/PyEMMA Cc: Mohan, Mamta; Mention Subject: Re: [markovmodel/PyEMMA] Continuous time MSM (#951)

Hi Mamta,

apologies for the long delay. I had some time to dig into this error only now. I created a pull request with a tentative fix on markovmodel/msmtools#98https://github.com/markovmodel/msmtools/pull/98 There seems to be some internal problem of the l_bfgs_b routine as it apparently can violate the constraints that are imposed during the minimization of the likelihood. The fix would be to install a new version of msmtools from github once the pull request is merged and call rate_matrix with the option on_error='warn'. This will turn the error that you get into a warning. Hopefully then the l_bfgs_b minimizer will recover from overshooting the bounds in a subsequent iteration, s. t. the result will be ok. I know that this might not be the solution you expect. However there are still some numerical difficulties with the maximum-likelihood rate matrix estimator. The gradient of the likelihood requires an element-wise division by the transition matrix which leads to a bad condition. In a previous commit, I had already implemented the ad-hoc fix that was proposed by Pande and McGibbon for this problem in DOI:10.1063/1.4926516 This might already solve the problem.

Fabian

- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/markovmodel/PyEMMA/issues/951#issuecomment-270003023, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APcc_amL0pwjv89JitiSJqpoipU-4QVSks5rOTtmgaJpZM4KL6L3.