New method: Landmark Kernel tICA

j3mdamas commented 7 years ago

Hi guys,

Just saw this paper from Pande et al and I thought I would share it with you guys: http://biorxiv.org/content/early/2017/04/04/123752

Still haven't read it thoroughly, but it looks promising, specially the non-linearity aspect. They seem to improve on their previous paper: http://pubs.acs.org/doi/abs/10.1021/ct5007357

What do you think? Could this landmark kernel tICA be interesting for PyEMMA?

Thanks, João

franknoe commented 7 years ago

Hey João, sorry for being out of sync. Let me explain me the relationship between methods first. To my best knowledge the basis of nonlinear approximation of MD eigenfunctions is our variational approach. I think it would be fair for follow-up works to clearly say that and also use consistent nomenclature, instead of just refering to TICA. TICA was originally introduced (in the 90s) as a linear combination of signal variables - for us molecular coordinates or maybe simple order parameters. The combination of nonlinear functions of state space, and the actual idea that this algorithm approximates MD eigenfunctions was introduced as VAC (variational approach of conformation dynamics) here:

https://pdfs.semanticscholar.org/ace6/d4fbac23c0e6bb185cdbb259d5a1242e8671.pdf http://publications.imp.fu-berlin.de/1388/1/14_JCTC_NueskeEtAl_Variational.pdf (the second paper is more readable than the first, and there's also a recent review in Curr. Opin. Struct. Biol.)

The Kernel formulation of the TICA problem is a slightly different but closely related way to introduce nonlinearity. Kernel formulations are algorithmically most attractive when you have less timesteps than dimensions. This is a typical scenario in geosciences, but not what we have. Practically, in order to use the Kernel formulation for MD you need some sort of subsampling, because otherwise you'd have to compute, store and solve a T x T matrix where T is the number of timesteps. "Landmark" here means you select a number of frames and compute the Kernel function (e.g. Gaussian) with all other frames, i.e. you only compute a T x k matrix, and you use that to approximate the eigenvalues of the full problem. We also have some Nystroem stuff coming up that goes in this direction.

For now, I don't think this is needed, however. I think just using the variational approach with Gaussians (as in the second paper above) will behave very similar, if its not even mathematically equivalent (I'm not sure). In principle all code for that is in PyEMMA, we even have estimators to work with very short trajectories now, see here:

 https://arxiv.org/pdf/1610.06773.pdf

However, I agree it would be helpful to have a convenient estimator for this stuff in PyEMMA, especially for something like Gaussian basis sets on landmarks, which is likely to give a good and efficient approximation of eigenfunctions and be useful for dimension reduction, and it has the interpretation of a Markov transition model (MTM):

 http://aip.scitation.org/doi/full/10.1063/1.4913214

The only disadvantage I would see to direct TICA is that you loose the direct interpretation in terms of molecular order parameters, it is more like an MSM.

So we could do something like a GaussianLandmarkVA estimator in PyEMMA which makes the appropriate calls. Essentially this could be a 1-week job including writing a paper, as really everything is there. Anyone interested in doing this, e.g. @fnuekse, @fabian-paul or @cwehmeyer? Would you like to get involved, João? I think I will have some time next week to think a little bit of how to extend the PyEMMA structure in this direction, to make it most useful.

Best, Frank.

Am 06/04/17 um 17:50 schrieb João M. Damas:

Hi guys,

Just saw this paper from Pande et al and I thought I would share it with you guys: http://biorxiv.org/content/early/2017/04/04/123752

Still haven't read it thoroughly, but it looks promising, specially the non-linearity aspect. They seem to improve on their previous paper: http://pubs.acs.org/doi/abs/10.1021/ct5007357

What do you think? Could this landmark kernel tICA be interesting for PyEMMA?

Thanks, João

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1069, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQvdbfa08dX-4Dg1l7zVGj58K1Vs9ks5rtQnegaJpZM4M1zGz.

--

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

j3mdamas commented 7 years ago

Hi Frank,

Sorry for the late reply. Thank you for you explanation, I could follow it and I think I got the point.

As you know, my interest is more at the application level. Particularly, I'd like to benchmark how well all the methods, tweakings and improvements compare when applied to ligand-binding systems (not only benzamidine-trypsin, but also some GPCR binding). Not only at the analysis level, but also on the improvement of adaptive.

At the development of this estimator you mention, I'm not sure on how could I contribute, as I think you guys are the best at that :)

markovmodel / PyEMMA

New method: Landmark Kernel tICA #1069

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany