msmbuilder / msmbuilder

:building_construction: Statistical models for biomolecular dynamics :building_construction:
http://msmbuilder.org
GNU Lesser General Public License v2.1
155 stars 95 forks source link

the unit of impliedtimescales plot and how to decide lagtime? #1081

Closed Avon07 closed 6 years ago

Avon07 commented 6 years ago

i get a impliedscales plot from my trajectories files as below and i am wondering about the unit of this plot (both the X axis and Y axis).

I am not sure what the "todo:unit" stand for what ?

According to the tutorial, i guess it stands for the frequency that we extract the frames from the trajectories as input? for example, 20 means that we take 1 frame from the input trajectories every 20 frames as the input for MSM analysis.

I am not sure whether my understanding is correct?

implied-timescales-0424-test

in addition, from this plot/figure, we can see that the cruve made up of many dots level off after 1 or 2 units along the X axis, so i guess that i can choose somethings between 1-10 as lagtime for my MSM ?

thank you very much.

msultan commented 6 years ago

For your plot, I think around 60 might work nicely. Timescales are only logarithmically correct for most MSM models.

The x-axis and y-axis are in units of simulation time steps. So if your trajectory is 200ps apart , multiply the x-axis and y-axis by that to get the time in ps.

tagging @jadeshi @cxhernandez just to make sure I am not being wrong here.

Avon07 commented 6 years ago

I cannot quite understand what you mean by “Timescales are only logarithmically correct for most MSM models.”. Also, for the choice of lag time, according to most of the articles I read about, authors will choose a lag time when the curve level off, and I am not sure in my plot, whether the curve between 0-20 can be regarded to level off and 0-20 may be chosen as potential lag time? For the units of the axises, I guess the word time steps are similar to the word frame? If so, I think I can understand. Many thanks for your help!

jadeshi commented 6 years ago

“Timescales are only logarithmically correct for most MSM models.” just means that the plot looks flat on a log-scale. This is the default mode of how the implied timescale plots are shown anyway (e.g. your plot), so you don't have to worry about it.

In practice, the implied timescales rarely completely levels off, except for very simple systems like alanine dipeptide, so usually we just pick a lagtime where the curve starts to look flat relative to the curve before it (subjective, I know). In this case, I'd agree with @msultan that 60 ns, or maybe even 40 ns, should be a good choice.

Finally, yes, time steps and frames are interchangeable in this case, so however long a frame is in your simulation, multiply it by the numbers in your plot to get real time.

Avon07 commented 6 years ago

thank you for your answers and I am also wondering, for example, if I have 200 clusters, what's the difference between the MSM that I built with lag time of 40ns or 60ns ? are they the same or at least similar thermodynamically ?

also, i guess the choice for lag time may be relatively unstricted, as long as the curve level off or as you said, look "relatively flat", we can use that lag time ?

thank you very much !

jadeshi commented 6 years ago

I'd expect the MSMs to be similar. The thermodynamics / kinetics of the system should be independent of lagtime in a truly Markovian regime (i.e. where the timescales plot is actually completely flat). To verify this, I'd recommend just building the models at 40 ns and 60 ns and comparing their first eigenvectors, which represent the state equilibrium populations. You could also compare the transition matrices directly, and calculate, for example, the average difference in transition probabilities.

msultan commented 6 years ago

Closing this now. Feel free to open another issue later on.