[coordinates.tica] include input mean option

gph82 commented 9 years ago

If want to compute TICA at two (or more) different taus, it should be possible to skip the first pass of TICA once the mean is already known.

franknoe commented 9 years ago

And C0, which is more expensive.

Is it still true that we can skip the mean and C0 with the way of counting that Fabian has suggested? I think there both the mean and C0 might depend on tau, but I don't remember exactly.

Am 27/05/15 um 11:44 schrieb Guillermo Pérez-Hernández:

If want to compute TICA at two (or more) different taus, it should be possible to skip the first pass of TICA once the mean is already known.

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/331.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

gph82 commented 9 years ago

Yes, the less we recompute, the better.

However, before we get into implementing, testing etc the @fabian-paul method, just skipping the first pass should be easy to implement. I would prioritize this now. It's a huge timesaver for the FAH dataset

franknoe commented 9 years ago

I suggest to do this in your own fork if you need it now.

For the devel branch and future releases, I would like to sort out the basic questions to have something stable that we won't change again in the subsequent version. Questions include:

The @fabian-paul method is already (optionally) used in pyEMMA, so if it's means and C0 are tau-dependent we must exclude that both options are used. Also we should discuss whether Fabian's counting method is what we want to stick with and then make it the standard option.
I think it is worthwhile considering to build separate estimators for correlation matrices and means and have TICA use them. We will also need them for other estimators such as the variational approach. But this is a model decision we need to discuss and evaluate.

Am 27/05/15 um 11:49 schrieb Guillermo Pérez-Hernández:

Yes, the less we recompute, the better.

However, before we get into implementing, testing etc the @fabian-paul https://github.com/fabian-paul method, just skipping the first pass should be easy to implement. I would prioritize this now. It's a huge timesaver for the FAH dataset

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/331#issuecomment-105843051.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 9 years ago

this is somehow related to #62. Of course we should avoid unnecessary recomputation. We can implement this in the setter of the lag property.

franknoe commented 9 years ago

Let's postpone these things until after Melbourne. We have discuss about the organization of Estimators and that also affects this question. I could got into more detail, but this is too complicated for an issue.

Am 27/05/15 um 17:15 schrieb Martin K. Scherer:

this is somehow related to #62 https://github.com/markovmodel/PyEMMA/issues/62. Of course we should avoid unnecessary recomputation. We can implement this in the setter of the lag property.

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/331#issuecomment-105952188.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

gph82 commented 9 years ago

I don't think this needs to be done before Melbourne. I've started to play a littlebit on my fork, though

On 27.05.2015 17:19, Frank Noe wrote:

Let's postpone these things until after Melbourne. We have discuss about the organization of Estimators and that also affects this question. I could got into more detail, but this is too complicated for an issue.

Am 27/05/15 um 17:15 schrieb Martin K. Scherer:

this is somehow related to #62 https://github.com/markovmodel/PyEMMA/issues/62. Of course we should avoid unnecessary recomputation. We can implement this in the setter of the lag property.

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/331#issuecomment-105952188.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/331#issuecomment-105953473.

Dr. Guillermo Pérez-Hernández Freie Universität Berlin Institute for Mathematics Arnimallee 6 D-14195 Berlin tel 0049 30 838 75776

http://userpage.fu-berlin.de/gph82/

franknoe commented 9 years ago

OK great

Am 27/05/15 um 17:21 schrieb Guillermo Pérez-Hernández:

I don't think this needs to be done before Melbourne. I've started to play a littlebit on my fork, though

On 27.05.2015 17:19, Frank Noe wrote:

Let's postpone these things until after Melbourne. We have discuss about the organization of Estimators and that also affects this question. I could got into more detail, but this is too complicated for an issue.

Am 27/05/15 um 17:15 schrieb Martin K. Scherer:

this is somehow related to #62 https://github.com/markovmodel/PyEMMA/issues/62. Of course we should avoid unnecessary recomputation. We can implement this in the setter of the lag property.

— Reply to this email directly or view it on GitHub

https://github.com/markovmodel/PyEMMA/issues/331#issuecomment-105952188.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— Reply to this email directly or view it on GitHub

https://github.com/markovmodel/PyEMMA/issues/331#issuecomment-105953473.

Dr. Guillermo Pérez-Hernández Freie Universität Berlin Institute for Mathematics Arnimallee 6 D-14195 Berlin tel 0049 30 838 75776

http://userpage.fu-berlin.de/gph82/

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/331#issuecomment-105954634.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 9 years ago

Yes with my method, the mean and C0 depend on tau. So you always want to estimate the triple (mean,C,Ctau) in one go. When you change the lag-time, you need to estimate a new triple. This is analogous to MSMs where the row-sum of the count matrix is also lag-time dependent (even though for infinite statistics it shouldn't). Before we worry too much whether some optimizations are compatible with my algorithm, we should check the practical advantage of my algorithm. Maybe it isn't worth it.

Because @gph82 mentioned it: I have asked Jan-Hendrik how to include the effect of out-of-equilibrium initial conditions into TICA. He said, that the solution is simply not to symmetrize Ctau. However the consequence of not symmetrizing Ctau is that we need a different algorithm for solving the (generalized) eigenvalue problem. I think it is not known how to do this for large matrices (C0 and Ctau).

franknoe commented 9 years ago

Am 03/06/15 um 14:34 schrieb fabian-paul:

Yes with my method, the mean and C0 depend on tau. So you always want to estimate the triple (mean,C,Ctau) in one go. When you change the lag-time, you need to estimate a new triple. This is analogous to MSMs where the row-sum of the count matrix is also lag-time dependent (even though for infinite statistics it shouldn't). Before we worry too much whether some optimizations are compatible with my algorithm, we should check the practical advantage of my algorithm. Maybe it isn't worth it.

I would rather use an algorithm that ensures that our matrices have the right structure. It's easy to have examples where the eigenvalues are close to 1, so I would rather have them below 1 than have them about 1 in that case. Still there may be optimization potential. Probably we do a lot of extra calculations because the C(0)'s are almost the same for different tau's, so I guess we could re-use some quantities.

Because @gph82 https://github.com/gph82 mentioned it: I have asked Jan-Hendrik how to include the effect of out-of-equilibrium initial conditions into TICA. He said, that the solution is simply not to symmetrize Ctau.

I don't think that's the right answer. The non-symmetry and out-of-equilibrium starting conditions are completely different things. You can get nonsymmetric estimates even if the starting conditions are in equilibrium, simply as a result of a finite time series.

Unless you really have nonreversible dynamics, Ctau should always be symmetric because we know that the dynamics would give rise to a symmetric tau asymptotically. In MSMs we have a reversible MLE (which generates a symmetric Ctau for that model). For general correlation matrices we can use Hao's OOM Theory to do that.

However the consequence of not symmetrizing Ctau is that we need a different algorithm for solving the (generalized) eigenvalue problem. I think it is not known how to do this for large matrices (C0 and Ctau).

Maybe by doing a SVD of C0, truncating the spectrum, using the result to basis-transform Ctau and then solving an ordinary nonsymmetric eigenvalue problem. But I don't think that's the right approach.

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/331#issuecomment-108370370.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

markovmodel / PyEMMA