markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
311 stars 119 forks source link

TICA and MSM Lag time Enquiry #1403

Closed lunnali closed 5 years ago

lunnali commented 5 years ago

Dear USERS

I am new to this Markov State Method method as well as the software so please excuse me for some simple questions.

For the lag time parameters for both e.g. pyemma.coordinates.tica() andd pyemma.msm.bayesian_markov_model(), can I choose these lag times independently from each other or there is some restrictions?

I ask this because I notice that in the example tutorial at http://www.emma-project.org/latest/generated/00-pentapeptide-showcase.html, the lag time for tica has been first chosen based on VAMP-2 score. After tica reduction, the lag time for msm was first set to the same value as TICA, to check the VAMP-2 score for a series of discretion states, in order to choose a proper number of states. Later, the chosen lag time for msm is checked by running ITS convergence run to check if this lag time is good enough for msm -- and it is. If I have understood the above procedure correctly - Is this always the case, that a good TICA lag time should be good enough for building msm? Or it depends on data and featurization?

Thank you so much for your patience!

Best Wishes

thempel commented 5 years ago

Hi lunnali,

In principle, both TICA and MSMs approximate the transfer operator, just that TICA does so in a linear basis of feature functions and MSMs in a basis of step functions that are defined by your discretization. Hence, in theory and neglecting all possible sources of error, choosing a large enough lag time should yield converged results for both TICA and MSMs. In practice, mainly due to the discretization error, this usually does not work and it is therefore possible to choose those lag times independently. (It might make limited sense to choose MSM lag times smaller than the TICA lagtime because that would mean to first filter-out faster processes and to subsequently try to model them with an MSM.)

Experience has shown that for TICA, a shorter lag time often yields better results (you can plot a couple of example trajectories in the first TIC(s) to get a feeling for the quality of the transformed data.). So you might very well end up with two different lag times, one which optimally transforms your data into a lower dimensional space with TICA and another lag time for the MSM yielding converged implied timescales. So in practice, a good TICA lag time is not necessarily a good MSM lag time and depends on your data, featurization and discretization.

Cheers, Tim

lunnali commented 5 years ago

Hi lunnali,

In principle, both TICA and MSMs approximate the transfer operator, just that TICA does so in a linear basis of feature functions and MSMs in a basis of step functions that are defined by your discretization. Hence, in theory and neglecting all possible sources of error, choosing a large enough lag time should yield converged results for both TICA and MSMs. In practice, mainly due to the discretization error, this usually does not work and it is therefore possible to choose those lag times independently. (It might make limited sense to choose MSM lag times smaller than the TICA lagtime because that would mean to first filter-out faster processes and to subsequently try to model them with an MSM.)

Experience has shown that for TICA, a shorter lag time often yields better results (you can plot a couple of example trajectories in the first TIC(s) to get a feeling for the quality of the transformed data.). So you might very well end up with two different lag times, one which optimally transforms your data into a lower dimensional space with TICA and another lag time for the MSM yielding converged implied timescales. So in practice, a good TICA lag time is not necessarily a good MSM lag time and depends on your data, featurization and discretization.

Cheers, Tim

Thanks a lot! :)