markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
310 stars 119 forks source link

VAMP & VAMPnet implementation in PyEMMA #1224

Closed mkhoshle closed 6 years ago

mkhoshle commented 6 years ago

Hi all,

I am very new to PyEMMA and I am trying to test VAMP (Variational approach for learning Markov processes from time series data) and VAMPnet(VAMPnets: Deep learning of molecular kinetics) to Alanine-Dipeptide simulations. Are these methods implemented anywhere in PyEMMA? Can anyone point me to some example of these methods? I just want to see what methods are available for analyzing non-equilibrium simulations in PyEMMA?

Thanks,

marscher commented 6 years ago

Currently there is no direct integration of VAMPnet in PyEMMA. I currently dont know if there are plans to integrate it. The VAMP method is on a feature branch named vamp, if you want to try it out. Expect the later to be merged/released soon. Your last question depends somehow on what you are trying to achieve. You can build Markov state models on based upon local equilibrium data (on the Markov states). The OOMReweightedMSM estimator tries to correct errors, if only the local equilibrium has been reached. If your system is somehow externally driven, you can still analyse stationary quantities, when you perform a reversible estimate. This is however a crude assumption, because the system is inherently not reversible since its driven.

franknoe commented 6 years ago

An initial VAMPnet implementation is currently in github.com/markovmodel/deeptime. We are working on generalizing it / making it better. Likely the neural network stuff will end up in the deeptime package because it requires different dependencies and would blow up PyEMMA too much.

VAMP is already used in PyEMMA's MSM score functions, but it will be available also for TICA later.

mkhoshle commented 6 years ago

@marscher @franknoe Thank you. I have read several of your most recent papers for applying dimensionality reduction to non-equilibrium data. But, I think I am a little bit confused: Variational Approach (VA), TICA, MSM are applicable to ergodic, reversible, metastable processes.

However, since ultra long trajectories are not always possible to get, people try improved estimators like OOMReweightedMSM, Koopman model (based on EDMD), VAMP, etc for non-reversible non-stationary dynamics (i.e. Non-equilibrium data).

1) I suppose OOMReweightedMSM, Koopman model (based on EDMD), VAMP are different ways of achieving stationary estimates from non-reversible data. Am I correct? Which of these methods is superior to others and why?

2) Based on what you explained above, does that mean that OOMReweightedMSM is applicable only if data is in local equilibrium or non-reversible data with external forces?

3) I believe even if we start the simulation form equilibrium, the simulation deviates from equilibrium over time and this kind of system is irreversible (i.e. the data do not satisfy detailed balanced theory) until it reaches global equilibrium. Therefore, all of these approaches for non-reversible processes should be applicable. Is that correct?

4) Also, how do you know if you are in local equilibrium for complicated systems?

franknoe commented 6 years ago

Hi Mahzad, sorry for the long delay. VAMP is now merged and included in the 2.5 release. But I would wait for 2.5.1 or use the devel version as there are still some hickups it seems. @marscher, I don't see any mention of VAMP in the 2.5 Changelog. We should definitely include this.

@mkhoshle , regarding your questions: You need to differentiate between systems that are really out of equilibrium (i.e. systems that do not just operate in thermal equilibrium, but systems that are driven, such as an ion channel in an ion gradient, or a protein that is unfolded by force), and systems that are in principle simulated in equilibrium, but the individual trajectories are too short to sample from the equilibrium distribution.

VAMP and VAMPnets are unique in that they cover all these cases.

I think, however, you are referring to the second case. Then you have many options, because most methods work with short trajectorieis, including OOMs, HMMs, MSMs, VAMP and VAMPnets. TICA with default parameters does not, but it will work if you use the option reversible=False. In practice, if you just use TICA for dimension reduction my guess is that it doesn't matter much except for pathological cases.

Personally, I prefer using either of the following: 1) Traditional method: TICA-based dimension reduction plus clustering plus HMMs, because that gives you a few-state model that is easier to interpret than a MSM with many states, and more accurate than a MSM with few states. 2) New method VAMPnets.

I currently like VAMPnets best because it's very general and solves many problems at once, but I think getting the implementation running is still a bit tricky, and there are issues with convergence and large datasets. In other words, it will take some time to get it stable and useful for production, but give it a try. If it fails, try the traditional approach.

In most cases having enough data and a reasonable featurization is essential. If you have both, pretty much all method work well. If you are lacking either, pretty much everything will fail (i.e. fail in Chapman-Kolmogorov and implied timescales tests) and typically give inconsistent answers. Having not enough data is the most common problem.