Interpretation of plot_pop_resids

msmbuilder / msmexplorer

Data visualizations for biomolecular dynamics

http://msmbuilder.org/msmexplorer/

MIT License

17 stars 17 forks source link

Interpretation of plot_pop_resids #94

Closed jeiros closed 7 years ago

jeiros commented 7 years ago

I'm building an MSM on the internal dynamics of a ligand, which I think should be well sampled within microseconds of simulation. I can see 'clean' jumps in my tIC time evolution, but when the pop_resids plot is looking very different from the one in your documentation.

download

What kind of information can I extract out of msme.plot_pop_resids? I've never seen this plot in a publication.

msultan commented 7 years ago

Its a proxy for how much the MSM is perturbing the raw MD population of a microstate. In the large sampling limit, they should be close/almost exact. If they are different, that can still be okay and I would trust the MSM population more provided you do a bit of rigorous bootstrapping/central limit tricks for Poisson process to ascertain the modeling error.
In this case, even if you have a lot of sampling if your ligand has few A<->B transitions, then the MSM populations might still get skewed because the model is parameterized by the transitions and not the raw counts.
To get an idea whether or not there is enough sampling to study a process, try to compute the MFPT or exchange timescales between the states. The aggregate sampling should be >> the exchange timescales.

jeiros commented 7 years ago

Hi @msultan thanks for your answer. So if I understood you correctly, you would expect to see a completely decorrelated cloud of points (in the case of large sampling where the MSM populations and the MD populations match)? What exactly are the residuals? From my plot above, it looks like the Raw Populations axis is fairly homogeneously distributed. But with a strong correlation on the Residuals (?)

msultan commented 7 years ago

Exactly, for any given microstate, its population would ~ msm population, leading to a gaussian cloud around 0 i think the residual is np.log10(MSM)-np.log10(raw counts). so a difference of 1 is that something like 10x more. However, its important to note that for lowly sampled populations this might not be significant. 0.003 vs 0.03 is 10x but is hardly worth worrying about. Similarly 0.003 vs 0.0003 is the same in the opposite direction but again nothing to worry about.

@cxhernandez do i have the residual formula right?

msultan commented 7 years ago

hmm, maybe we should incorporate @jadeshi's code for doing bootstrapping here somehow.

cxhernandez commented 7 years ago

Yeah, basically you're observing how the MSM has corrected your populations. An ideal plot (with complete sampling) would have small decorrelated fluctuations in the residuals. Here, it seems like you have a set of microstates which are probably undersampled but, like @msultan said, this is probably not an issue if the MFTPs look reasonable.

@cxhernandez do i have the residual formula right?

Yup!