markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
307 stars 118 forks source link

Metastable state distribution #1536

Closed rubinanoor9 closed 2 years ago

rubinanoor9 commented 2 years ago

Hello! I am trying to build markov state model. I have 300 trajectories files, each with 50 ns. I have used TICA lag time of 200, while 150 number of clusters for microstates. The MSM estimated by the implied time scale plot and the MSM spectral analysis plot, the MSM spectral analysis plot clearly indicated that we can separate these states. For the MSM estimation I tried two different msm lag time (checked individually) by this command line; M = msm.estimate_markov_model(dtrajs, 10) OR M = msm.estimate_markov_model(dtrajs, 20) I also obtained the fraction of states and counts =1. But when I tried to perform Perron cluster cluster analysis, the states has not been separated as mentioned in the attachment file. For the metastable distribution, I have used: n_sets = 4 M.pcca(n_sets) pcca_dist = M.metastable_distributions pcca_sets_4 = M.metastable_sets

My question is that where I am doing error. I am little bit confused about that, How can I separate these state, and How many states can be separated by looking at the free energy plot. Here I have attached the file having plots of each analysis. plots.pdf

Thanks in advanced, Regard Rubina

thempel commented 2 years ago

Hi Rubina, I can't open that PDF for some reason, can you try to print it to pdf?

rubinanoor9 commented 2 years ago

Now check it plots.pdf

thempel commented 2 years ago

So to me looks like there's two separate issues:

Multiple PCCA states in one basin of free energy plot: I guess that you picked your number of metastable states based on the basins in the free energy plot. However, these are sometimes misleading. E.g. one in your case is just very small - it may be a transition region rather than a metastable state and appear as a "blop" only because of the histogram. So a number of 3 states may be a more appropriate choice.

State separations of TICA plot don't match PCCA state separations: When comparing TICA projections and MSM/PCCA states, one should keep in mind that TICA is a linear projection method and that the MSM approximates the dynamics using indicator functions. In the current case that may simply mean that state separations in the TICA projection are no longer exact for the MSM. It looks to me that you get some states nicely resolved (like the red one) but others (like the brown one) are not. That may mean that the dynamics is not well-represented in the TICA plot, e.g. because it is non-linear in the input features. Maybe you can check out a trajectory of IC2 and see how the transitions between the regions in question is - is it really slow or rather fast?

PS: I think that your implied timescales may not be converged - maybe it's better to check out a double-log plot.

rubinanoor9 commented 2 years ago

Thank you so much, I have separated these three states by changing the lag time and the number of clusters. Yes, you was right, our implied timescale plot was not converged. By changing the lag time to build model, problem has been resolved by me. Thanks again.