markovmodel / PyEMMA

🚂 Python API for Emma's Markov Model Algorithms 🚂
http://pyemma.org
GNU Lesser General Public License v3.0
307 stars 118 forks source link

Exception: Stationary distribution entries smaller than 1e-15 #714

Closed stefdoerr closed 6 years ago

stefdoerr commented 8 years ago

After the update I started getting these errors. What used to happen before you introduced these error messages? How should I handle them? As far as I remember, the same code used to run quietly before.

/shared/sdoerr/Software/anaconda3/lib/python3.4/site-packages/pyEMMA-2.0.4-py3.4-linux-x86_64.egg/pyemma/msm/api.py in estimate_markov_model(dtrajs, lag, reversible, statdist, count_mode, sparse, connectivity, dt_traj, maxiter, maxerr)
    501                     maxerr=maxerr)
    502     # estimate and return
--> 503     return mlmsm.estimate(dtrajs)
    504 
    505 

/shared/sdoerr/Software/anaconda3/lib/python3.4/site-packages/pyEMMA-2.0.4-py3.4-linux-x86_64.egg/pyemma/_base/estimator.py in estimate(self, X, **params)
    339         if params:
    340             self.set_params(**params)
--> 341         self._model = self._estimate(X)
    342         self._estimated = True
    343         return self._model

/shared/sdoerr/Software/anaconda3/lib/python3.4/site-packages/pyEMMA-2.0.4-py3.4-linux-x86_64.egg/pyemma/msm/estimators/maximum_likelihood_msm.py in _estimate(self, dtrajs)
    263             P = msmest.transition_matrix(self._C_active, reversible=self.reversible,
    264                                          mu=statdist_active, maxiter=self.maxiter,
--> 265                                          maxerr=self.maxerr)
    266         elif self.connectivity == 'none':
    267             # reversible mode only possible if active set is connected

/shared/sdoerr/Software/anaconda3/lib/python3.4/site-packages/msmtools/estimation/api.py in transition_matrix(C, reversible, mu, method, **kwargs)
    939         if mu is None:
    940             if sparse_computation:
--> 941                 T = sparse.mle_trev.mle_trev(C, **kwargs)
    942             else:
    943                 T = dense.mle_trev.mle_trev(C, **kwargs)

msmtools/estimation/sparse/mle_trev.pyx in msmtools.estimation.sparse.mle_trev.mle_trev (msmtools/estimation/sparse/mle_trev.c:2266)()

Exception: Stationary distribution contains entries smaller than 1e-15 during iteration
franknoe commented 8 years ago

There has been some changes affecting the scipy sparse matrix routines, perhaps the different behavior is due to that. But let's first check if the problem is due to PyEMMA: Can you PyEMMA downgrade to 2.0.4 and varify that the same dtrajs don't give rise to that error?

In any case, the error is probably a result of pathological case, but the message is not helpful. If you send me a file with your dtrajs, I'm happy to check why this happens.

Am 29/02/16 um 15:50 schrieb Stefan:

After the update I started getting these errors. What used to happen before you introduced these errors? How should I handle them? As far as I remember, the same code used to run quietly before.

/shared/sdoerr/Software/anaconda3/lib/python3.4/site-packages/pyEMMA-2.0.4-py3.4-linux-x86_64.egg/pyemma/msm/api.pyin estimate_markov_model(dtrajs, lag, reversible, statdist, count_mode, sparse, connectivity, dt_traj, maxiter, maxerr) 501 maxerr=maxerr) 502 # estimate and return --> 503 return mlmsm.estimate(dtrajs) 504
505

/shared/sdoerr/Software/anaconda3/lib/python3.4/site-packages/pyEMMA-2.0.4-py3.4-linux-x86_64.egg/pyemma/_base/estimator.pyin estimate(self,X,params) 339 if params: 340 self.set_params(params) --> 341 self._model= self._estimate(X) 342 self._estimated= True 343 return self._model

/shared/sdoerr/Software/anaconda3/lib/python3.4/site-packages/pyEMMA-2.0.4-py3.4-linux-x86_64.egg/pyemma/msm/estimators/maximum_likelihood_msm.pyin _estimate(self, dtrajs) 263 P = msmest.transition_matrix(self._C_active,reversible=self.reversible, 264 mu=statdist_active,maxiter=self.maxiter, --> 265 maxerr=self.maxerr) 266 elif self.connectivity== 'none': 267 # reversible mode only possible if active set is connected

/shared/sdoerr/Software/anaconda3/lib/python3.4/site-packages/msmtools/estimation/api.pyin transition_matrix(C, reversible, mu, method,kwargs) 939 if muis None: 940 if sparse_computation: --> 941 T = sparse.mle_trev.mle_trev(C,kwargs) 942 else: 943 T = dense.mle_trev.mle_trev(C,**kwargs)

msmtools/estimation/sparse/mle_trev.pyxin msmtools.estimation.sparse.mle_trev.mle_trev (msmtools/estimation/sparse/mle_trev.c:2266)()

Exception: Stationary distribution contains entries smaller than1e-15 during iteration

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/714.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

stefdoerr commented 8 years ago

Actually you are right, it might not have to do with the update. I reduced the amount of data I used so that will probably be the case. Once I've figured it out I will come back. The error sounds though like it could be caused by a very underpopulated cluster.

franknoe commented 8 years ago

Sure. In any case, I'm happy to look for the exact reason for this. Ideally, PyEMMA should catch these things before they happen or at least give an understandable error.

Am 29/02/16 um 16:19 schrieb Stefan:

Actually you are right, it might not have to do with the update. I reduced the amount of data I used so that will probably be the case. Once I've figured it out I will come back. The error sounds though like it could be caused by a very underpopulated cluster.

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/714#issuecomment-190254503.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 8 years ago

@stefdoerr you're right this error message refers to an unpopulated cluster. The problem with that is the fact of resulting eigenvalues bigger than one, if one has such tiny elements in the stationary distribution.

markovmodel/msmtools/pull/68

stefdoerr commented 8 years ago

Here is an example dtraj that causes the error dtraj.npy.zip Doing a bincount shows multiple clusters with population 1. In any case, you can use the example if you want to test the error to report some more helpful error message :)

stefdoerr commented 8 years ago

Sorry, forgot to give you the lag time etc.

import numpy as np
import pyemma
z = np.load('dtraj.npy')
pyemma.msm.estimate_markov_model(z.tolist(), 200)

This should do it

franknoe commented 8 years ago

thanks!

Am 02/03/16 um 14:07 schrieb Stefan:

Doing a bincount shows multiple clusters with population 1. In any case, you can use the example if you want to test the error to report some more helpful error message :)

— Reply to this email directly or view it on GitHub https://github.com/markovmodel/PyEMMA/issues/714#issuecomment-191232008.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe commented 8 years ago

I think it's rather an msmtools issue. This code has the same behavior:

import numpy as np
import msmtools

z = np.load('dtraj.npy')
C = msmtools.estimation.count_matrix(z.tolist(), 200)
Csub = msmtools.estimation.largest_connected_submatrix(C)

# this matrix is connected!
print msmtools.estimation.is_connected(Csub)

# this fails with the following exception:
# Exception: Stationary distribution contains entries smaller than 1e-15 during iteration
msmtools.estimation.transition_matrix(Csub, reversible=True)

We get vanishing stationary density even though the matrix is connected. I haven't looked into the detailed reason for that (it's a big matrix), but it's easy to think about a pathological case with such a behavior. Imagine a chain of states with this kind of count matrix:

[[0, 10, 0,  0, ...],
 [1,  0,  10,  0, ...],
 [0,  1,   0, 10, ..., ]
 [...]]

Clearly you can get arbitrarily small eq. probabilities in the first state by extending the chain. The equilibrium probability is needed to be nonzero in the iterative solver, so we set a tolerance. Perhaps the tolerance 1e-15 is too careful - in principle it seems to me that we only need to check for >0, but I guess this has been set by either @fabian-paul or @trendelkampschroer , so I'd like them to comment on it.

Alternatively, we could generally beat such cases by using an iteration in log-space (energies), which leads to the same optimizer but can deal with much greater dynamical ranges. But that's much more expensive, so I would only do that as a fall-back solution.

fabian-paul commented 8 years ago

Whoever introduced the check (https://github.com/markovmodel/msmtools/blob/devel/msmtools/estimation/dense/_mle_trev.c#L107) should perhaps comment on this. @trendelkampschroer ? What was the reason to introduce it? Was it because the estimator didn't converge? I'm not sure whether if we really need that strict condition. Not sure either whether log-space can give any better results. 1.E-15 is the safety margin for round-off (not underflow). log space wouldn't help there, but maybe some reordering of algebraic operations.

trendelkampschroer commented 8 years ago

Hi, someone started to observe that for some reversible transition matrix estimate a subsequent eigenvalue decomposition resulted in eigenvalue(s) greater 1. Digging into this problem @marscher and I discovered that this was caused because the stationary probabilities (i.e. the row sum x, x_new) had extremely small values (e.g e^{-3xx}) during the estimation procedure. We decided it was better to issue a warning during the estimation so that cases for which nonsensical eigenvalues would be computed downstream were catched early on. I would suggest playing around with the tolerance eps_mu in msmtools.estimation.sparse.mle_trev and see if the problem goes away. If that is the case and the eigenvalues are still okay than we can try to set a less restrictive default value for eps_mu.

fabian-paul commented 8 years ago

I had a quick look on the MSM estimator with Frank and we noticed, that the stationary distribution that is estimated by mle_trev is not saved in the msm object. See line https://github.com/markovmodel/PyEMMA/blob/devel/pyemma/msm/estimators/maximum_likelihood_msm.py#L268 Maybe fixing this will resolve some numerical issues. In the reversible case, pi then wouldn't have to be estimated from T. Maybe the the solution of the eigenvalue problem could also benefit from a more accurate pi. I tried out Frank's pathological example and I found that mle_trev gives a meaningful stationary distribution but recalculating it from T is not possible.

franknoe commented 8 years ago

Can I ping you guys on this again? I like the idea of saving the stationary distribution in the MSM object, but this wouldn't solve the issue that this failure also occurs in the pure msmtools example given above.

fabian-paul commented 8 years ago

Well, we should try to use the stationary distribution form the reversible estimator for the similarity transform in the solution in the eigenvalue problem. Maybe this would improve things. In principle this could work as a kind of preconditioning of the eigenvalue solver, because we already know one of the eigenvectors. If this is that case, we could set a less restrictive default.

franknoe commented 8 years ago

OK, but in the example posted above already the call

msmtools.estimation.transition_matrix(Csub, reversible=True)

results in the problem that the equilibrium distribution is too small on some states. So that needs to be fixed first. @fabian-paul , are you suggesting to disable this problem by setting the tolerance to a smaller value (e.g. close to double precision), and additionally making eigenvalue decompositions more robust by including the stationary distribution in them? I think that might work within PyEMMA. It wouldn't work in msmtools alone (i.e. if you passed such a matrix on to msmtools.analysis.eigenvalues() without the stationary distribution, you might run into the problem that @trendelkampschroer has described). But I think that would be OK, because as long as msmtools.estimation.transitionmatrix produces a valid transition matrix and the re-estimated equilibrium distribution is again all positive, there is no reason to complain.

LaurenReid94 commented 7 years ago

Hi,

I'm receiving the same error message about the stationary distribution being smaller than 1e-15. I'm just wondering if you guys decided on a work around? Is there a way for the user to specify a smaller cutoff, or is the only solution to use more data?

fabian-paul commented 7 years ago

Hi @LaurenReid94 ,

I think the threshold for the small elements in the stationary distribution can now be decreased to some value smaller than 1e-15. Pyemma has changed since this threshold was introduced. For instance we now have better routines for computing eigendecompositions, that exploit reversibility and use the stationary distribution that is estimated together with the MSM (see comments above). Unfortunately, there is not way (yet) for the user to change this threshold. I think we should add this to the estimate_markov_model() api function.

As Frank pointed out above, small elements in the stationary vector arise naturally in MSMs that model a chain of transitions with a strong bias in the forward (or backward) direction. This code e. g. can trigger the exception.

import numpy as np
import msmtools
c=np.array([[1.E6, 1.E6, 0, 0], [1, 1.E6, 1.E6, 0], [0, 1, 1.E6, 1.E6], [0, 0, 1, 1.E6]])
T=msmtools.estimation.transition_matrix(c, reversible=True)

From the side of the user, some care is necessary in such situations. In the example above, are the transitions in the backward direction (count matrix element = 1) real, or are they artifacts from bad clustering? Right now this has to be checked by the user. You might also experiment a little with the mincount_connectivity parameter of estimate_markov_model.

Fabian

LaurenReid94 commented 7 years ago

Hi @fabian-paul,

Thanks for the explanation. So the current solution would be to increase mincount_connectivity to exclude cluster pairs with few counts, or alternatively to seed new simulations from the underpopulated clusters? If I used the former with reversibility = True, would the mirroring count matrix elements also be set to 0? So in your example, the count matrix would change to look like this:

[[ 1.E6 , 0 , 0 , 0 ], [ 0 , 1.E6 , 0 , 0 ], [ 0 , 0 , 1.E6, 0 ], [ 0 , 0 , 0 , 1.E6 ]]

Thanks, Lauren

pleasanttj commented 4 years ago

I'm running into this problem with pyemma.msm.bayesian_markov_model and/or pyemma.msm.BayesianMSM.cktest (although the error is ignored in the latter).

Based on reading this thread, it seems like the problem occurs when states selected by clustering algorithms are not sufficiently populated and, since this situation can (or at least used to) cause numerical problems, it raises an error. But weirdly enough, with everything else equal, I do not get this problem with 300 clusters, but I do with 200 clusters.

It looks likes setting mincount_connectivity=1 will create a MSM that doesn't run into this problem. That said, if this happens occur within pyemma.msm.BayesianMSM.cktest but not pyemma.msm.bayesian_markov then varying mincount_connectivity is not possible directly.

Edit: It seems like the problem occurs for pyemma.msm.its even if you set mincount_connectivity=1.

thempel commented 4 years ago

Clustering the data with more centers does not necessarily mean that there cannot be single cluster centers that are very sparsely populated. It might be that the discretization with 300 centers is just better, e.g. by covering a larger fraction of sparsely populated space (which can be complicated when the state space density differs very strongly between regions). I see that this error message is a bit unspecific, but my personal experience is that it mostly points towards some actual problems with the data in that particular discretization. You might want to check for connectivity of that low probability state and what portion of the free energy landscape it describes. Changing estimation parameters might somehow circumvent the error being raised, but I'd strongly recommend to understand the problem first and, if necessary, solve later with parameter choices like you were doing with mincound_connectivity.