markovmodel / variational

Basis sets, estimators and solvers for the variational approach of conformation dynamics. NOTE: the code has been merged with PyEMMA and is maintained there.
10 stars 5 forks source link

estimators.moments_XXXY: X and Y arrays get centered at different means. #30

Closed fnueske closed 7 years ago

fnueske commented 8 years ago

In estimators.moments_XXXY, the arrays X and Y are centered at different means (namely (1/w)_sx, (1/w)_sy) if remove_mean=True and symmetrize=False. Isn't that inconsistent? I think they should be centered around the same mean, namely (1/w)*sx.

@franknoe, if you agree, I would change that together with the inclusion of weights.

franknoe commented 8 years ago

Certainly the same mean doesn't work in general (you assume that X and Y are just time-shifted subsets of the same timeseried, but in this part of the code that's not granted - they could be completely different sets of data, so the mean of X and Y might not have anything to do with one another). Also I don't think that (1/w)*sx can guarantee eigenvalues <= 1 for the nonreversible case. My feeling is that the current version is correct, but I haven't proven it. Perhaps it can be shown based on Fabian's demonstration that our symmetrization approach guarantees meaningful eigenvalues?

Am 04/05/16 um 13:06 schrieb Feliks Nüske:

In estimators.moments_XXXY, the arrays X and Y are centered at different means (namely (1/w)/sx, (1/w)/sy) if remove_mean=True and symmetrize=False. Isn't that inconsistent? I think they should be centered around the same mean, namely (1/w)*sx.

@franknoe https://github.com/franknoe, if you agree, I would change that together with the inclusion of weights.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/30


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 8 years ago

I think it is unlikely that we can show that the eigenvalues will be <= 1 for the unsymmetrized case, if we subtract the means of X and Y independently. If cov(x,y,0) = \sum_i^{N-t} x_i y_i and cov(x,y,tau) = \sum_i^{N-t} xi y{i+tau} I think we can't show that cov(x,y,0) >= cov(x,y,tau), because different data go into the left and right hand sides of the inequality, so no general statement can be made. If we subtract the mean of x and y independently, this would stay true: the left and right hand sides would still depend on different data. If we use the same mean for all data, there could at least be a chance that we can show something.

franknoe commented 8 years ago

So do you think we should generally use the mean of x for both in the nonsymmetric case?

If so, I think we need to have some sort of flag to enforce this in the moments code, because if X and Y are not time-lagged but from two different data sources, it makes no sense to use the same mean for both.

Am 04/05/16 um 17:28 schrieb fabian-paul:

I think it is unlikely that we can show that the eigenvalues will be <= 1 for the unsymmetrized case, if we subtract the means of X and Y independently. If cov(x,y,0) = \sum_i^{N-t} x_i y_i and cov(x,y,tau) = \sum_i^{N-t} xi y{i+tau} I think we can't show that cov(x,y,0) >= cov(x,y,tau), because different data go into the left and right hand sides if the inequality, so no general statement can be made. If we subtract the mean of x and y independently, this would stay true: the left and right hand sides would still depend on different data. If we use the same mean for all data, there could at least be a chance that we can show something.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/30#issuecomment-216901928


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fnueske commented 8 years ago

Indeed, I would suggest to make using the same mean for both X and Y the default case of the moments code, as we almost exclusively use the moments code for time lagged data. For time lagged data (without symmetrization), using the mean of X for both time series is the most consistent treatment, if we have the short trajectory estimation methods in mind.

The user could be allowed to override this, however.

Am 04.05.16 um 19:09 schrieb Frank Noe:

So do you think we should generally use the mean of x for both in the nonsymmetric case?

If so, I think we need to have some sort of flag to enforce this in the moments code, because if X and Y are not time-lagged but from two different data sources, it makes no sense to use the same mean for both.

Am 04/05/16 um 17:28 schrieb fabian-paul:

I think it is unlikely that we can show that the eigenvalues will be <= 1 for the unsymmetrized case, if we subtract the means of X and Y independently. If cov(x,y,0) = \sum_i^{N-t} x_i y_i and cov(x,y,tau) = \sum_i^{N-t} xi y{i+tau} I think we can't show that cov(x,y,0) >= cov(x,y,tau), because different data go into the left and right hand sides if the inequality, so no general statement can be made. If we subtract the mean of x and y independently, this would stay true: the left and right hand sides would still depend on different data. If we use the same mean for all data, there could at least be a chance that we can show something.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/30#issuecomment-216901928


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/30#issuecomment-216934523

franknoe commented 8 years ago

So we need a flag for this. But I would set it such that the different means are used by default because this module doesn't know anything about time-lagging. The defaults should be set such that they are meaningful within the module. We can override from outside to restrict the mean to x.

Am 05/05/16 um 20:06 schrieb Feliks Nüske:

Indeed, I would suggest to make using the same mean for both X and Y the default case of the moments code, as we almost exclusively use the moments code for time lagged data. For time lagged data (without symmetrization), using the mean of X for both time series is the most consistent treatment, if we have the short trajectory estimation methods in mind.

The user could be allowed to override this, however.

Am 04.05.16 um 19:09 schrieb Frank Noe:

So do you think we should generally use the mean of x for both in the nonsymmetric case?

If so, I think we need to have some sort of flag to enforce this in the moments code, because if X and Y are not time-lagged but from two different data sources, it makes no sense to use the same mean for both.

Am 04/05/16 um 17:28 schrieb fabian-paul:

I think it is unlikely that we can show that the eigenvalues will be <= 1 for the unsymmetrized case, if we subtract the means of X and Y independently. If cov(x,y,0) = \sum_i^{N-t} x_i y_i and cov(x,y,tau) = \sum_i^{N-t} xi y{i+tau} I think we can't show that cov(x,y,0) >= cov(x,y,tau), because different data go into the left and right hand sides if the inequality, so no general statement can be made. If we subtract the mean of x and y independently, this would stay true: the left and right hand sides would still depend on different data. If we use the same mean for all data, there could at least be a chance that we can show something.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/30#issuecomment-216901928


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/30#issuecomment-216934523

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/30#issuecomment-217228878


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fnueske commented 8 years ago

Ok, I'll include this in the changes I'm making right now.

Am 05.05.16 um 20:10 schrieb Frank Noe:

So we need a flag for this. But I would set it such that the different means are used by default because this module doesn't know anything about time-lagging. The defaults should be set such that they are meaningful within the module. We can override from outside to restrict the mean to x.

Am 05/05/16 um 20:06 schrieb Feliks Nüske:

Indeed, I would suggest to make using the same mean for both X and Y the default case of the moments code, as we almost exclusively use the moments code for time lagged data. For time lagged data (without symmetrization), using the mean of X for both time series is the most consistent treatment, if we have the short trajectory estimation methods in mind.

The user could be allowed to override this, however.

Am 04.05.16 um 19:09 schrieb Frank Noe:

So do you think we should generally use the mean of x for both in the nonsymmetric case?

If so, I think we need to have some sort of flag to enforce this in the moments code, because if X and Y are not time-lagged but from two different data sources, it makes no sense to use the same mean for both.

Am 04/05/16 um 17:28 schrieb fabian-paul:

I think it is unlikely that we can show that the eigenvalues will be <= 1 for the unsymmetrized case, if we subtract the means of X and Y independently. If cov(x,y,0) = \sum_i^{N-t} x_i y_i and cov(x,y,tau) = \sum_i^{N-t} xi y{i+tau} I think we can't show that cov(x,y,0) >= cov(x,y,tau), because different data go into the left and right hand sides if the inequality, so no general statement can be made. If we subtract the mean of x and y independently, this would stay true: the left and right hand sides would still depend on different data. If we use the same mean for all data, there could at least be a chance that we can show something.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/30#issuecomment-216901928


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/30#issuecomment-216934523

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/30#issuecomment-217228878


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/30#issuecomment-217229758

franknoe commented 8 years ago

thank you!

Am 05/05/16 um 20:13 schrieb Feliks Nüske:

Ok, I'll include this in the changes I'm making right now.

Am 05.05.16 um 20:10 schrieb Frank Noe:

So we need a flag for this. But I would set it such that the different means are used by default because this module doesn't know anything about time-lagging. The defaults should be set such that they are meaningful within the module. We can override from outside to restrict the mean to x.

Am 05/05/16 um 20:06 schrieb Feliks Nüske:

Indeed, I would suggest to make using the same mean for both X and Y the default case of the moments code, as we almost exclusively use the moments code for time lagged data. For time lagged data (without symmetrization), using the mean of X for both time series is the most consistent treatment, if we have the short trajectory estimation methods in mind.

The user could be allowed to override this, however.

Am 04.05.16 um 19:09 schrieb Frank Noe:

So do you think we should generally use the mean of x for both in the nonsymmetric case?

If so, I think we need to have some sort of flag to enforce this in the moments code, because if X and Y are not time-lagged but from two different data sources, it makes no sense to use the same mean for both.

Am 04/05/16 um 17:28 schrieb fabian-paul:

I think it is unlikely that we can show that the eigenvalues will be <= 1 for the unsymmetrized case, if we subtract the means of X and Y independently. If cov(x,y,0) = \sum_i^{N-t} x_i y_i and cov(x,y,tau) = \sum_i^{N-t} xi y{i+tau} I think we can't show that cov(x,y,0) >= cov(x,y,tau), because different data go into the left and right hand sides if the inequality, so no general statement can be made. If we subtract the mean of x and y independently, this would stay true: the left and right hand sides would still depend on different data. If we use the same mean for all data, there could at least be a chance that we can show something.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/30#issuecomment-216901928


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/30#issuecomment-216934523

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/30#issuecomment-217228878


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

https://github.com/markovmodel/variational/issues/30#issuecomment-217229758

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/30#issuecomment-217230647


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fnueske commented 8 years ago

Hey @franknoe and @fabian-paul , I have been trying to sort out all possible options for the estimation of mean-free correlations and collected the results in the attached document, re-using much of what you have found earlier. I think now that it makes sense to treat the following cases (standard = just summing up the correlations, mean-free = removing the mean):

As explained in the document, I think there is a problem with updating a mean-free estimate for time-lagged data without symmetrization: On the one hand, it is more consistent to define the mean only using the x-time series; on the other hand, we have no updating formula in this case. But I wonder if it is really desirable to remove the mean from, e.g., time-lagged correlations of short simulations.

Can I have your opinion on this? Thank you! Updating_Formulas.pdf

fabian-paul commented 8 years ago

Why is the weighted correlation defined as ?

Shouldn't it rather be something like ?

If we imagine that we reweight every frame, then correlations, which are related to joint probabilities, should involve two weights. Or not? I guess things would be much different then.

franknoe commented 8 years ago

yes, I think you are right.

fabian-paul commented 8 years ago

For the time-lagged covariance, two weights seem more appropriate than for the instantaneous covariance, because x and y come from the same frame in C(0). But maybe we get equations that work with chunking, if we look at the general case with two weights.

franknoe commented 8 years ago

I'm confused. Both the mean and covariance are expectations, mu = E[x(t)], cov = E[x(t) y(t)]. So when replacing this by a weighted expectation, you would only get a single weight in each case. Maybe we can interpret w_x*w_y as a single weight, but is that the only possible choice? Will think more about this tomorrow...

fnueske commented 8 years ago

Thanks for your comments! I agree with Frank that if we compute weighted averages, we should use a single weight for every term that is summed up. I do not know what the meaning of two weights would be.

Along these lines, I would suggest now to compute separate means for both the x- and y-series using only the first T weights w_t, t=1,...,T (where the original time-series has T+\tau steps). This is like saying that at time t, we make the observation x_t, and the time-lagged observation yt = x{t+\tau}, and we use the weights w_t to average over them and to correlate them. To me, this seems consistent, and it reproduces the old updating formula. Please have a look at the updated document to see what I mean. Updating_Formulas.pdf

franknoe commented 8 years ago

Am 18/05/16 um 11:27 schrieb Feliks Nüske:

Thanks for your comments! I agree with Frank that if we compute weighted averages, we should use a single weight for every term that is summed up. I do not know what the meaning of two weights would be.

Feliks: I guess the idea would be that you could treat w(t) = w_x(t) * w_y(t) as a single weight for this time step. The product is a "recipe" to construct this weight from the weights of the individual trajectory time steps. So it's not necessarily a contradiction to your approach, but I am still not sure if this "recipe" is correct or unique.

Along these lines, I would suggest now to compute separate means for both the x- and y-series using only the first T weights w_t, t=1,...,T (where the original time-series has T+\tau steps). This is like saying that at time t, we make the observation x_t, and the time-lagged observation yt = x{t+\tau}, and we use the weights w_t to average over them and to correlate them. To me, this seems consistent, and it reproduces the old updating formula. Please have a look at the updated document to see what I mean. Updating_Formulas.pdf https://github.com/markovmodel/variational/files/270069/Updating_Formulas.pdf

OK, will do.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/markovmodel/variational/issues/30#issuecomment-219973098


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

fabian-paul commented 8 years ago

I just want to understand why we are doing what we are doing, before I get too much into the modeling. I'm not familiar with the origin of there weights, but I can think of two settings:

Btw, I think if it is possible to reweight the covariances towards the equilibrium ensemble, then the reweighed matrices could be symmetrized. So I seem to disagree with Feliks here...

fnueske commented 8 years ago

What I had in mind was the first application, the trajectory re-weighting. I agree there should be a weight to correct the probability of observing x_t, but I don't think we intend to change the conditional transition probability, because it is given by the dynamics. So I'm still in favour of one weight only. I haven't thought about the second application.

About your last comment: That's a good point. And I think it also works if, again, we use the same weights for x and y. This really seems to be what caused all the confusion for me. I will work this out today.

fabian-paul commented 8 years ago

Oh, I thought this was somehow related to https://github.com/markovmodel/PyEMMA/issues/749 which seems to be a weighted ensemble application. You're of course right in having one reweighing factor if the conditional probabilities are unchanged.

franknoe commented 8 years ago

It is related, but I don't think we thought about changing the conditional probabilities there either.

Oh, I thought this was somehow related to https://github.com/markovmodel/PyEMMA/issues/749 which seems to be a weighted ensemble application. You're of course right in having one reweighing factor if the conditional probabilities are unchanged.


You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/markovmodel/variational/issues/30#issuecomment-220382509

fabian-paul commented 8 years ago

@franknoe aren't you changing the conditional probabilities, when a trajectory is split?

franknoe commented 8 years ago

What do you mean by "when a trajectory is split"? We have a fixed dataset we are working with, no?

@franknoe aren't you changing the conditional probabilities, when a trajectory is split?


You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/markovmodel/variational/issues/30#issuecomment-220417277

fnueske commented 8 years ago

Ok, from my perspective this is settled. If we use the same weights for x and y, also in the symmetric case, then all cases can be handled. I have implemented this and all tests are fine.

If you have no further concerns, this is ready to be merged.

franknoe commented 8 years ago

Thanks Feliks. Please do a pull request, and I'll review it.