markovmodel / pyemma_tutorials

How to analyze molecular dynamics data with PyEMMA
Creative Commons Attribution 4.0 International
71 stars 34 forks source link

Revision TH #152

Closed thempel closed 5 years ago

thempel commented 5 years ago

Additionally to what I've already mentioned in the issues below, I've added a paragraph about large systems to the manuscript. I'm not sure if this is the way to go but I also didn't know where else to put this information. @brookehus

fixes #138 fixes #140 adds some information about why feature selection / dimension reduction are important (#147)

thempel commented 5 years ago

fixes #139

thempel commented 5 years ago

I'd be happy if somebody could have a look at the feature_tic_correlation for the di-alanine. I've added some code on how one could do a more systematic search for a correlating feature in order to understand the TICA projection, but I'm not sure if that conveys the message.

thempel commented 5 years ago

Thanks very much for the review!

thempel commented 5 years ago

All requested changes above were incorporated (hope I didn't miss anything). Major refactoring of the manuscript section about large systems was done with @cwehmeyer. I'd be happy if @brookehus could have a look especially at the manuscript. Thanks again.

brookehus commented 5 years ago

After reading #155 I have a question about some of the CK test discussed in this PR, more for my personal understanding than for any manuscript changes. If I am reading correctly it seems like, if a CK test is failing, then maybe we should try a CK test with a different number of metastable states (e.g. line 143 in notebook 03). However, when is this number of metastable states actually used later in analysis? What does it mean if a CK test for 3 metastable states is bad but a CK test for 4 metastable states is good?

thempel commented 5 years ago

Important question. In principle, the CK-test as conducted in Pyemma requires the additional assumption of the number of macrostates. So the Markovianity assumption is tested by comparing the long time predictions of the current model with new estimates, both with the additional layer of coarse graining. If it fails with, say, 3 states, the model with this number of states cannot predict future transition probabilities. One could also say that this particular coarse graining does not yield a partitioning into Markov states. If the test passes with 4 states, those states are actually Markovian, i.e. knowledge of the current macrostate is enough to predict the future behavior. Don't know if that's a good explanation, we can also discuss this later.