Add correlation matrix estimator

C0/Ctau estimator can be copied from pyEMMA's TICA

I have just made a first commit for this class in the Estimator branch. Since I'm doing all of this for the first time, I have a lot of questions:

Regarding the class itself: The tica class in pyemma was written in a quite advanced way that goes together well with pyemma's pipelining system. The function that adds a trajectory gets a number of additional arguments, like an argument whether or not this piece is the last chunk in a trajectory, etc. Most importantly, the function is actually called twice for each dataset: The first time is only for the computation of the mean, the second time is for the computation of the matrices once the mean has already been estimated. This is necessary as we need to use the whole data to compute the correct mean. I wonder if we should keep it a lot simpler here, at least for the start. We could leave out the option to compute the mean entirely as this is usually not needed for general basis sets. For the moment, we could also assume that the data given to the function "add" is a complete trajectory.
I have implemented the standard estimator now. Shall I also include Fabian's estimator as an option?
Where are standard imports like numpy etc . carried out? At the top of each file or is there another place where these imports go?
What is the procedure once a branch like this has come to an acceptable state?

Am 28/07/15 um 12:40 schrieb Feliks Nüske:

I have just made a first commit for this class in the Estimator branch. Since I'm doing all of this for the first time, I have a lot of questions:

*
Regarding the class itself: The tica class in pyemma was written
in a quite advanced way that goes together well with pyemma's
pipelining system. The function that adds a trajectory gets a
number of additional arguments, like an argument whether or not
this piece is the last chunk in a trajectory, etc.
We don't need to copy all the complexity now to have a functional variational package. Let's just use full trajectories for now. So something like add(X) where X is a Txn basis set trajectory.

*
Most importantly, the function is actually called twice for each
dataset: The first time is only for the computation of the mean,
the second time is for the computation of the matrices once the
mean has already been estimated. This is necessary as we need to
use the whole data to compute the correct mean. I wonder if we
should keep it a lot simpler here, at least for the start. We
could leave out the option to compute the mean entirely as this is
usually not needed for general basis sets. For the moment, we
could also assume that the data given to the function "add" is a
complete trajectory.
I agree. For now we don't implement subtracting the mean, as this is usually not needed in the variational approach. There are actually ways to compute moments on-the-fly robustly without having to compute <x^2> -
^2 in the end, but they are a bit more involved. Using one of these on-the-fly methods is the right way to go in the longer term, and that way we only need one pass through the data. * ``` I have implemented the standard estimator now. Shall I also include Fabian's estimator as an option? ``` I think Fabian's estimator is always the better option. Let's just use that. * ``` Where are standard imports like numpy etc . carried out? At the top of each file or is there another place where these imports go? ``` Usually at the top of the file. If you need something only in one spot, you can also do that within a function. * ``` What is the procedure once a branch like this has come to an acceptable state? ``` Then one creates a pull request github (best ask Fabian or Martin about the details while I'm traveling). That will at least make a diff and then someone else (in this case me) can inspect the changes. If the reviewer accepts them, they can merge the branch (in this case devel or something else into master), and your branch can then be deleted. I am saying "at least a diff" because one usually sets up the git repository that at this point the test suite will be automatically run by github, and you can check if everything went fine on all supported platforms (linux, mac, windows), or whether any of your changes has introduced a problem. This is a very very useful feature. But since we don't have tests in this experimental package yet, this step is momentarily missing. — Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/2#issuecomment-125551486.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

Ok, thanks.

Am 28.07.15 um 12:51 schrieb Frank Noe:

Am 28/07/15 um 12:40 schrieb Feliks Nüske:

I have just made a first commit for this class in the Estimator branch. Since I'm doing all of this for the first time, I have a lot of questions:

*

Regarding the class itself: The tica class in pyemma was written in a quite advanced way that goes together well with pyemma's pipelining system. The function that adds a trajectory gets a number of additional arguments, like an argument whether or not this piece is the last chunk in a trajectory, etc.

We don't need to copy all the complexity now to have a functional variational package. Let's just use full trajectories for now. So something like add(X) where X is a Txn basis set trajectory.

*

Most importantly, the function is actually called twice for each dataset: The first time is only for the computation of the mean, the second time is for the computation of the matrices once the mean has already been estimated. This is necessary as we need to use the whole data to compute the correct mean. I wonder if we should keep it a lot simpler here, at least for the start. We could leave out the option to compute the mean entirely as this is usually not needed for general basis sets. For the moment, we could also assume that the data given to the function "add" is a complete trajectory.

I agree. For now we don't implement subtracting the mean, as this is usually not needed in the variational approach. There are actually ways to compute moments on-the-fly robustly without having to compute <x^2> -
^2 in the end, but they are a bit more involved. Using one of these on-the-fly methods is the right way to go in the longer term, and that way we only need one pass through the data. * I have implemented the standard estimator now. Shall I also include Fabian's estimator as an option? I think Fabian's estimator is always the better option. Let's just use that. * Where are standard imports like numpy etc . carried out? At the top of each file or is there another place where these imports go? Usually at the top of the file. If you need something only in one spot, you can also do that within a function. * What is the procedure once a branch like this has come to an acceptable state? Then one creates a pull request github (best ask Fabian or Martin about the details while I'm traveling). That will at least make a diff and then someone else (in this case me) can inspect the changes. If the reviewer accepts them, they can merge the branch (in this case devel or something else into master), and your branch can then be deleted. I am saying "at least a diff" because one usually sets up the git repository that at this point the test suite will be automatically run by github, and you can check if everything went fine on all supported platforms (linux, mac, windows), or whether any of your changes has introduced a problem. This is a very very useful feature. But since we don't have tests in this experimental package yet, this step is momentarily missing. — Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/2#issuecomment-125551486.

Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/2#issuecomment-125554340.

markovmodel / variational

Add correlation matrix estimator #2

Mail: Arnimallee 6, 14195 Berlin, Germany

Mail: Arnimallee 6, 14195 Berlin, Germany