markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
311 stars 119 forks source link

simplify param_add_data #138

Closed franknoe closed 9 years ago

franknoe commented 9 years ago

Having thought about the time-lagged data problem, I suggest the following change with respect to your latest TICA fix:

This convention forces the specific transformer to be explicit. When omitting the None check or not taking the explicit length of Y into account this will lead to an Exception, which is better than computing the wrong result. In contrast I find the 'trick' to use part of Y to compute the mean if one is at the end of a trajectory too convoluted - other people will have a hard time understanding this code.

franknoe commented 9 years ago

Before making code changes here, let's think about the mathematics a bit. Perhaps it is correct to compute mean and instantaneous correlation matrix on t-tau data points only, in order to have a consistent normalization. In any case we are not computing the real mean (which would requiring reweighting out-of equilibrium data), so perhaps all is needed for the empirical mean is that it is consistently computed with the time-lagged quantities, such that all normalizations are OK and we can expect to get eigenvalues below 1.

gph82 commented 9 years ago

Hi, regardless of how one implements it (with the copying data from Y or the other way), I do not have a clear opinion about this, except for the following: I believe one should use the estimations (means and covars) of the dataset that will ultimately be transformed (=all data). This way, one arrives at TICs that will be actually meanfree and var=1 (to the degree of the precison etc)

Still, let's think about it,

(nice WE everybody!)

franknoe commented 9 years ago

This is not at all clear to me. If you want to solve a generalized EV-Problem with covariance matrices C(0) and C(tau) obtained from empirical data estimates - what is the correct way of estimating the mean such that these matrices will be normalized correctly (e.g. such that we can always expect eigenvalues <= 1)?

It appears reasonable that you should use all data for the time-instantaneous correlation matrix. But perhaps for the time-lagged covariance matrix the answer is that the mean should only be computed on the T-tau frames.

Am 21/03/15 um 14:51 schrieb gph82:

Hi, regardless of how one implements it (with the copying data from Y or the other way), I do not have a clear opinion about this, except for the following: I believe one should use the estimations (means and covars) of the dataset that will ultimately be transformed (=all data). This way, one arrives at TICs that will be actually meanfree and var=1 (to the degree of the precison etc)

Still, let's think about it,

(nice WE everybody!)

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/138#issuecomment-84341740.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 9 years ago

One requirement that we could impose is that Cii(0)>=Cii(tau). I don't know yet how this relates to the eigenvalues of TICA but I suspect that this could imply that the eigenvalues are <=1. Can you show this? For infinite data it follows from the rearrangement inequality that Cii(0)>=Cii(tau). This would also be true, if we normalize Cii(0) and Cii(tau) with the same number. This is not how the TICA code does it at the moment. Therefore I guess that eigenvalues <=1 is not guaranteed by our implementation (for little data).

fabian-paul commented 9 years ago

Of course there is room for discussion whether we need to require that the eigenvalues <=1.

marscher commented 9 years ago

The data issue has been solved by #140 by Fabian, Guille and me. Do you want to continue the discussion about the math here or open a new issue?

franknoe commented 9 years ago

I'll open a new issue

franknoe commented 9 years ago

This discussion is continued here:

143